Matrix Algebra for 
Physicists 


This book is designed to introduce 
students of physics to matrix 
algebra. It will be particularly 
valuable for courses leading to 
B.Sc. and similar examinations. 


The first half is devoted mainly to 
the theories of matrices, deter- 
minants and vectors. Readers are 
assumed to have an elementary 
knowledge of vectors and complex 
numbers; but apart from that 
arguments involve hardly anything 
other than addition and multi- 
plication. 


The rest of the book presents the 
application of matrices in various 
fields of physics. There are suffi- 
cient worked examples at every 
stage and the theoretical chapters 
are accompanied by problems. 
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PREFACE 


This introduction to matrix algebra will be found of particular 
value to students working for B.Sc. and similar examinations. 

The matrix algebra commonly used in physics is presented in 
Chapters 1-7, but for a preliminary survey it is sufficient to consider 
only Sections 1-5, 7, 11, 15, 16, 19-23 and 25. 

Readers are assumed to have an elementary knowledge of vectors 
and complex numbers. Apart from that, arguments involve hardly 
anything but addition and multiplication. 

The application of matrices to various fields of physics is presented 
in the second half of the book. A number of examples are worked 
out in detail. They require a fair knowledge of the calculus; in some 
places even Fourier transforms and contour integration are used. 
No instructions for this kind of analysis are given since it happens to 
be better known than algebraic methods, even though the latter are 
more elementary. 
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CHAPTER 1 


VECTORS 


ooo 


1. Definition 


The mathematics presented in this volume are concerned with entities 
distinct from simple numbers. They require for their specification 
ordered sets of numbers rather than a single number. Ordinary 
vectors are entities of this kind: they are usually thought of as having 
a well-defined length and orientation and are denoted by arrows. In 
the present context it is more important to think of vectors in terms 
of their Cartesian components. A single vector 1s a concise expression 
for three quantities and an equation between vectors conveys the 
information otherwise conveyed by three equations. By the use of 
vectors mathematical expressions are reduced in length, numbers of 
equations are diminished and the content of mathematical arguments 
is given greater clarity by the shedding of inessentials. — 

The technique of substituting a single symbol for specified sets of 
quantities is by no means restricted to vectors. It is of particular use 
if applied to entities which require a large number of quantities for 
their specification. The mathematics of these entities might be un- 
manageable without the use of some formalism by which cumbersome 
expressions are contracted. Part of the formalism consists in rules for 
combining symbols, similar to the rules of vector algebra, 

In this volume we consider entities called ‘vectors’ (a generalization 
of the familiar concept) and ‘matrices’. Entities of this kind play an 
increasingly important part in contemporary physics. 

The rules by which vectors and matrices are combined are concise 
prescriptions which indicate repeated additions and multiplications. 
The subject is accordingly of an elementary character. Its only 
‘advanced’ aspect is the continual use of complex numbers. Results 
of geometry or of the calculus enter only into subsidiary arguments 
and into applications. " ι 

The subject is conveniently ΜΝ by the definition of ‘abstract 

A® 
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vectors’ (or just ‘vectors’) which are a generalization of the familiar 
space vectors. 


DEFINITION (1.A) A ‘vector in dimensions’ is an ordered set of πὶ 
real or complex numbers to which are applied certain well-defined 
rules. The numbers themselves are called components of the vector. 


If in a set of vectors some particular component or components of 
every vector should vanish the number of dimensions of these vectors 
might be ambiguous. This ambiguity will not affect any conclusions 
to be derived. It is convenient to define a null vector the components 
of which are zero throughout. Its dimension is consistent with the 
context in which it appears and it is usually denoted by O. 

Vectors are specified by writing the components in the appropriate 
order in a row or in a column. At a later stage it will sometimes be 
necessary to distinguish between row and column vectors. In general 
terms the components can be denoted by subscripts such as ay, 
a,...4y. A concise notation for a vector is called for: a bold faced 
letter will be used. Thus a denotes a vector, in particular a column 
vector; the corresponding row vector is denoted by 4. 

As an example the two sets 1, 2, 3, 4, 5 and 1, 3, 2, 4, 5 can be 
regarded as vectors in 5 dimensions; they are distinct by the order of 
the components although the numerical values of the components are 
the same. 

The term ‘scalar’ will occasionally be applied to numbers in order 
to distinguish them from vectors. Rules for calculating with vectors 
will be laid down by definition. Although this procedure may be 
axiomatic, the rules will nevertheless be readily accepted since they 
are familiar when applied to space vectors. 


DEFINITION (1.B) A vector is multiplied by a scalar by multiplying 
every single component of the vector by that scalar. The result is a 
vector. 


Thus ¢ = 4a has the components c; = λα; (j = 1,2...n). 


DEFINITION (1.C) Two vectors are added by adding corresponding 
components. The result is a vector. 


Thus p=r-+s has the components ἢ, ΞΞΗ, Ἔ5, G=1, 
ξ΄, ΟἿΣ 
In specifying a vector the components can be written as the co- 
efficients of a polynomial or some other series. Thus the polynomial 
AX + ἀρχὴ +... + Gyx" 
can be regarded as a vector with the components a,. The meaning of 


the variable x is of minor significance. This representation has the 
advantage of complying automatically with the rules (1.B) and (1 Ὁ 
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if a polynomial is multiplied by a number or two polynomials are 
added. 

The importance of space vectors in physics is common knowledge. 
Vectors other than space vectors are used in various contexts. If, for 
example, a quantity x is a periodic function of the time ¢ with the 
period τ it can be expanded in a series 


x(t) = > by exp (—2aitj/2) (1.1) 
j 


If two series of the form (1.1) with different coefficients but the same 
period are added or if one series is multiplied by a number the result 
is a different time function with the same period. Here the co- 
efficients ὃ, are conveniently regarded as components of a vector. If 
this formula is applied to a vibrating string it is seen that the ampli- 
tudes of the different harmonics are instances of an abstract vector. 
Generally the concept of vectors will be applied to sets of numbers 
for which addition in accordance with (1.C) has a meaning in physics. 

The concept of vectors may be generalized in such a way that 
vector components need not necessarily be numbers but can be 
entities of a different kind; examples are given in Sections 37 and 45. 


2. Linear dependence 


Given a set of vectors new vectors can be derived by multiplying 
the original vectors by scalars and then adding. The resulting vectors 
are called linear combinations of the original vectors. The vector 
(—6, 3, 0, 3), for example, is a linear combination of (—2, 1, —1, 2) 
and (2, —1, —2, 1) obtained by multiplying the first vector by 2, the 
second by —1, and adding. . 

A linear combination of vectors may be a null vector even if the 
vectors themselves are non-zero. This possibility will now be in- 
vestigated in some detail. ᾿ ᾿ 

First consider space vectors. Two vectors determine the orientation 
of a plane to which both are parallel. The sum of these vectors is also 
parallel to the plane. Multiplication of the two vectors can—if 
negative factors are used—change the direction of the vectors within 
the plane but the orientation of the plane would not be affected. It 
follows that all linear combinations of the two vectors are parallel to 
one and the same plane. 

A vector which is not parallel to the plane cannot be parallel to any 
linear combination of the two original vectors. Thus three space 
vectors which are not coplanar cannot form a vanishing linear com- 
bination. On the other hand it is plausible to assume that three co- 
planar space vectors can form a vanishing linear combination. This 
will eventually be proved. 
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In fact, two forces acting on a point can be balanced by a force 
parallel to the same plane, whatever the magnitude of the forces ma 


be; they could not be balanced by a force which is not parallel to the 


plane of the two other forces. 
Turning now to abstract vectors let a, a .. , a®), and g, 
” a rae κι be a set of vectors and of scalars r ively. 


DEFINITION (2.A) If the numbers g) can be chosen in such a way 
that the linear combination 
gMa® + gal) 4+ 0.4 g®a® — O 
the vectors are said to be linearly dependent. If a choice of this kind 
is impossible the vectors are ‘linearly independent’. 
In this definition it is understood that not all of the q” are zero. 


THEOREM (2.B)n +- 1 vectors in m dimensions are linearly dependent. 


This is proved by induction. It is assumed that the theorem has 
been established for m vectors in n — ] dimensions; its validity for 
n + 1 vectors in » dimensions is deduced. 

If n out of the n +- 1 vectors are linearly dependent the theorem is 
proved by choosing n out of the n + 1 scalars g in sucha way that 
the corresponding linear combination of n vectors vanishes; the 
remaining scalar g is made zero. 

If all sets of n vectors are linearly independent then there must be 
at least one out of m+ 1 vectors which has a non-vanishing first 
component. If that were not so, all m + 1 vectors would have only 
n— Il non-vanishing components and could not be distinguished 
from vectors in m — 1 dimensions; they would be linearly dependent 
by premises. The vectors are labelled or, if necessary, will be re- 
labelled in such a way that the component αὐ ἢ does not vanish. 

A new set of vectors is now defined: 

b® = ghtMah) — gingn+) (7p —1,2,. .n) (2.1) 
The first components of these vectors vanish: 
δι" i=. al * Nain) = ae) =0 
According to premises the n vectors b are linearly dependent. It is 


accordingly possible to choose the numbers p™, p@) . . . p™ in such 
a way that 


n n 
ἐν p>” “τα Py P™ lal * Van ie αὐ δ 11η πα 0 


r=1 r=1 
Hence by putting 
4) = pair +1) (r on 1-2. 4. n) 


n 
943), ς. (r) (1) 
q -" >? a 


r=1 


VECTORS 5 


a vanishing linear combination of the vectors a can be constructed, 
thus completing the induction proof. -- 

In order to prove theorem (2.B) it is sufficient to show that it is true 
for two vectors in one dimension. If ¢c and d are vectors in one 
dimension their linear combination gives 

qc + qd == Q, if g°)/q™ = —c/d 
Theorem (2.B) is accordingly shown to be valid. 

A corollary of (2.B) is that three coplanar space vectors are 
linearly dependent whereas in general three space vectors are linearly 
independent. ι . 

As the proof of theorem (2.B) involves rather abstract reasoning 
the argument is now illustrated by an example. Consider five vectors 
in four dimensions. 

a® = (—1, 1,1, 1) a) = (1, —1, 1, 1) 

a®) = (1,1,-—1,1) a®=(1, 1,1, —)) (2.2) 

a®) = (1, 1, 1, 1) 
Then, by (2.1) 

b® = af - a() — (0, 2, 2, 2) 
b® — al) — a) — (0, —2, 0, 0) 
b®) — a) — ai) — (0, 0, —2, 0) 
b® = a — a) = (0, 0, 0, —2) 


It is readily verified that the linear combination 
b® + b® + p@ + p® —0 


and that 
(Ὁ — gq?) --- η(3) — q” = l, g® — —2 


n vectors or any smaller number of vectors in n dimensions may be 
linearly independent or dependent. So far no criterion has been given 
for deciding this alternative. . . 
If n linearly independent vectors in m dimensions are given it 

follows from theorem (2.B) that every vector in m dimensions can be 
represented as a linear combination of these vectors. The representa- 
tion is of particularly simple form if the following basic vectors are 
used 

e® = (1,0...0) 

e(?) = (0,1...0) 


(2.3) 


e” — (0,0...1) 


In representing any vector a as a linear combination of these basic 
vectors the coefficients are equal to the components a,. 
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3. Scalar product: Orthogonality 


The scalar product of two space vectors is defined as the product of 
their magnitudes and the cosine of the angle between them. It is 
equal to the sum of the products of their corresponding Cartesian 
components. If the vectors are perpendicular to each other the scalar 
product vanishes whatever the magnitudes of the vectors may be. 
The scalar product of a vector by itself is the square of the magnitude 
of the vector. 

Similar statements can be made with respect to abstract vectors. 
If the scalar product of two column vectors a and b is denoted by 
(ab), 

n 
(ab) = > αὖ, (3.1) 
j=1 
where the asterisk denotes the conjugate complex. 

Readers are reminded that the conjugate complex of the number 
a + if is α — if (a, β being real) and vice versa, and the conjugate 
complex of a product of two complex numbers is the product of the 
conjugate complex factors, i.e. 


[(~ + Ἰβ)ίψ + in)” = (α — iB)(y — in) 


The inequality of Schwarz 


If a and b are vectors then 
| (ab) |? < ab? (3.2) 


This relation can be used for finding an upper limit for scalar pro- 
ducts. In order to prove (3.2) let 


S = a*h? — | (ab) 3 (3.3) 
- > [| a; |*| δὲ |? — aj δια» 
‘ 7, k 
and 


ΤῊΣ δῷ | ashy — ay; |? (3.4) 
i,k 


>, (abe — axbj(a5bs — at) 

j,k 

= DU α be + | 1 δ,15 — afaybyby — aja, 
ἢ, Κα 


ΞΞ 2 ye Πα, |?| δὲ [5 — aja,b,%] 
jk 
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In (3.3) and (3.4) summations are performed with respect to j and to 
k and the sums are to be taken from 1 to 7. On account of its defini- 
tion as a modulus square T > 0. By (3.3) and (3.4) T = 2S. Hence 
S > 0 so that the inequality (3.2) is proved. . 

The scalar product of abstract vectors has no simple geometrical 
meaning. It is nevertheless convenient to call any pair of abstract 
vectors ‘orthogonal’ if their scalar product vanishes. 


THEOREM (3.A) If a set of vectors is linearly dependent, none of 
these vectors can be orthogonal to all other vectors of the set. 


This is seen as follows: 
> a =0 ο; #0 


2 
If a® is a vector out of the set then scalar multiplication by a(*) 


results in 


> (aa) + ca? = 0 

j#k 
As the last term cannot vanish there must be at least one non- 
vanishing scalar product in the sum, thus proving the theorem. 


Schmidt’s method of orthogonalization 


Given a set of linearly independent vectors a® . . , a it is possible 
to derive an equal number of mutually orthogonal vectors (b™ . . . b) 
by linear combinations of the original vectors. These vectors are 
determined by the following set of equations 


b® = a 
(a (2))p()) 
pb?) = al) — (b aye (3.5) 
(bab) 
b” ==. a” — (60)? 


These formulae are proved by induction. It is assumed that 
b® ...b@-) are mutually orthogonal. On account of this assump- 
tion all except two terms of (bb) vanish. The two non-vanishing 
terms are (ab) and 

a (a‘ Ip) (bb? ’) 
bj 
Thus the two terms cancel, showing that b” is perpendicular to all 
b®. As (bb) is seen to be zero the proof is completed. 

In this argument no use is made of the linear independence of the 

vectors 40), However, by theorem (3.A) there cannot be any linear 
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dependence of the b®. In fact, if the a® are linearly dependent at 
least one of the b® would be identically zero. aso y 

The rules of vector algebra so far, are of limited scope. If it is to be 
extended it is necessary to define functional relations between 


vectors other than that of linear combination. That will be achieved 
in the next chapter. 


CHAPTER 2 


EXERCISES 
1. If complex numbers are regarded as vectors in two dimensions Mat RICES 
and real numbers as scalars, show that the rules (1.B) and (1.C) 
apply to complex numbers. 
2. Find the components of a vector x which satisfies the equation 
28 — 3x =b : 
4, Nomenclature 


where a and b have the components (4, —1, 3, —2) and (—2, 1,0, — 1) 
respectively. 

3. Show that the vectors (—1, 0, 1); (0, 1, 1); C1, 1, 0) are linearl 
dependent and that the vectors (1, 1, 1, 1);(1,i, —1, ine —i,i τ 
are mutually orthogonal. 


4. Four vectors in three dimensions are defined as the coefficients 
of the polynomials 


f=1-+ 2x + 3x? g= —4+ 3x + 3x? 
h = 2 — 2x? k = 4 + 5x — 6x? 
Show that they are linearly dependent by proving that 
— 2f + 48g + 153h — 28k = 0 


DEFINITION (4.A) A matrix is an ordered array of real or complex 
numbers aligned in rows and columns so that they form a rectangle, 
to which are applied certain well-defined rules. 


If the number of rows is equal to the number of columns the array 
is called a square matrix. If the array consists of a single row or a 
single column it is a row vector or column vector. In this book the 
term matrix is applied to square matrices only unless any other mean- 
ing is explicitly stipulated. A matrix is said to be of nth order or 
n-dimensional if it consists of m rows and » columns. 

A matrix can be specified by writing the array of numbers explicitly 
and enclosing it in a square bracket as shown below 


§  =3 442 
6—i 0 7 
1 a. 4g 


The numbers of which the matrix is constituted are called matrix 
elements and can be denoted by a symbol with two subscripts, such 
aS ay,. The first subscript denotes the row, the second the column in 
which the matrix element stands; j and & are whole numbers ranging 
from 1 to m in ascending order from left to right and from top to 


bottom. In the present example a,, = 5, dy, = —3, dy = 6 —1, 
Ως = —9. An alternative notation for matrix elements is <j | a | k > 
or just a(j, k). 


By deleting an equal number of rows and columns of any matrix 
a ‘sub-matrix’ of smaller number of rows and columns is defined; in 
this definition it is understood that the order of the remaining rows 
and columns is not modified. A single matrix element can be regarded 
as a one-dimensional sub-matrix. 
9 
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A matrix can also be regarded as an array of row vectors or of 
column vectors. They will be denoted by a;. and a., respectively. 

Characters like A will be used to denote the matrix as distinct 
from matrix elements. In handwritten work it is necessary to 
denote vectors and matrices in a way which does not depend upon 
printing techniques. This may be done by underlining symbols for 
vectors and double underlining symbols for matrices, 

Simple matrices may have only two rows and columns, but there is 
no upper limit for the number of rows and columns. In this book the 
number of rows and columns will be considered to be finite although 
the concept of infinite matrices is quite common. 

The ‘leading diagonal’ (or just ‘diagonal’) of a matrix consists of 
those elements a,,, for which ἢ = k. A matrix in which all elements out- 
side the main (i.e. leading) diagonal vanish is called a diagonal 
matrix. An important special type of diagonal matrix is that in 
which all diagonal elements are equal to unity. A matrix of this kind 
is called a unit matrix and denoted by I. Alternatively the notation 
6; (“Kronecker symbol’) is in use, implying that it is zero for ἡ ~k 
and unity for 7 = k. Matrices in which all elements are zero are 
called null matrices and conveniently denoted by O. 

To every matrix a ‘transposed’ matrix can be constructed by inter- 
changing rows with columns, the main diagonal remaining un- 
changed. The transposed matrix to A is denoted by A and ἄμ; = Azz. 
The conjugate complex to the transposed matrix is called the matrix 
‘adjoint’ to A and is denoted by At. This term is taken from the 
adjoint of linear differential operators which are in various respects 
Closely related to matrices. In the mathematical literature the term 
‘adjoint’ is frequently used for ‘adjugate’, a term to be explained later. 
Our term ‘adjoint’ is then replaced by “Hermitean conjugate’. This 

nomenclature is unsuitable for our purposes and will not be used in 
this book. The matrix elements of adjoint matrices are related by 


αἷ; = ὅλ, 
The sum of all the diagonal elements of a matrix is called its ‘trace’. 


The terminology introduced in this section is the basic vocabulary 
of matrix algebra. It will be used in all arguments to follow. 


5. Matrices and vectors 


Two matrices are equal if every element of the first is equal to the 
corresponding element of the second matrix. Thus the equation 
A=B means that a, = b,, for all values of j and k. A single 
_ equation between matrix symbols is a concise expression for n? 
equations between numbers. 

The rules for the addition of matrices and the multiplication of 


MATRICES 11 


matrices by numbers (i.e. scalars) are almost identical with the 
corresponding rules for vectors. 


ix 1 ipli by multiplying 
EFINITION (5.A) A matrix is multiplied by a scalar by mul 
le single element of the matrix by the scalar. The result is also a 
matrix of the same size as the original matrix. 


dding their corre- 
DEFINITION (5.B) Two matrices are added by ad 
sponding elements. The result is a matrix. The original matrices must 
be of the same size, however. 


On account of the rule for addition the equation A = B can be 
itten as A — B= 0. L.A 
ie follows from (5.4) and (5.B) that rans combinations of 
ices can be formed which are again matrices. 
μον above .rules of matrix algebra will in due course be 
supplemented by rules for multiplying matrices by other — ον 
With these rules mathematical relations between matrices can : 
established which are to some extent similar to mathematica 
relations between numbers. This will, however, not be followed up at 
the present stage because it would be too remote from conventional 
mathematics and its use in physics. At present it will be shown πὸ 
relations involving matrices and vectors are closely connected wi 


familiar algebra. 
Consider for this purpose the set of simultaneous equations 


AyyXy + AyXq + +. - + Ann = V1 
AoyXy + GagXq + «++ + AanXn = Je 


dnaXy τ ΕΣ +++ GnnXn = Yn (5.1) 


ν ; ; the 
In these equations the number of terms in each equation on 
left-hand side is equal to the number of equations. This need not be 
so, but we shall not consider simultaneous equations of a more 
eneral type here. 
: Obviously the quantities x, ...x, and y,... ¥, can be regarded ἐν 
vectors x and y. y is obviously a column vector and it is assumed that 
x is also a column vector. The coefficients a;, are elements of a 
matrix A. Equations (5.1) can be written in the compact form 
n 
> ἄραι τε», G=1,2..-0) (5.2) 
k=1 
i i tion if the 
Hence, equations (5.1) can be reduced to a single equa 
multiplication of a column vector by a matrix is defined in accordance 
with the left-hand side of (5.2). 
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DEFINITION (5.C) The product of a matrix and a column vector is 
the column vector Ax which has the components 


2 ἄχεχ; wa ΣΝ > AniX% (5.3) 
k k 


provided that the number of rows in A is equal to the number of 
elements in x. 
With this definition (5.1) is written as 
Ax=y (5.4) 
Equation (5.4) can also be expressed in terms of the transposed 


matrix A. In this case row vectors are required which have the same 
components as the column vectors x and y. Then equations (5.2) and 
(5.4) are written as 


>, Ain =r (5.5) 


J 
and 
ΧΑ = jf (5.6) 

and the multiplication rule (5.3) is modified in a similar way. 

Equation (5.4) shows that a matrix is an operator which trans- 
forms any vector into another vector. If a matrix operates on a linear 
combination of vectors, matrix-vector multiplication commutes with 
the vector-scalar multiplication and vector addition. 


A(Ap + μῷ = A(Ap) + (Aq) (5.7) 
On account of this relation matrices are called ‘linear operators’. 

Equations (5.1) or (5.4) can be interpreted by assuming that A and 
y are given and that the equations have to be solved for the ‘un- 
knowns’ x, ... xX, or briefly, for the unknown vector x. In physics 
equations of similar form occur in various contexts. 

If x and y are space vectors A is usually (but not necessarily) a 
tensor. For example A could be the dielectric tensor of an anisotropic 
medium if x and y were the electric field strength and the dielectric 
displacement respectively. In the thermodynamics of irreversible 
processes the thermodynamic forces and fluxes are regarded as 
abstract vectors and a matrix, representing the thermodynamic and 
transport properties of some material, links these vectors by an 
equation of the type (5.4). The most important application of 
matrices in classical physics concerns the theory of vibrations; this 
will be considered in Chapter 8. 
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EXERCISES 


1. Given the matrix 
I1+i OO -- i 
2i 0 2i 2 
μας. μα 1 4 +- 3] 
0 1-i 2 -2--ἰ 


express the matrices A and At in terms of the elements of A. Show 
that the traces of these matrices vanish, . 

2. Given the vector x with the components (1, =, i, 1 +1) 
calculate Ax, where A is defined in the preceding exercise. 

3. Let the elements of a matrix be time functions. Derive from 
(5.A), (5.B) and the familiar definition of differential coefficients the 
differentiation -rule for matrices according to which every matrix 
element is to be differentiated with respect to the time. 

4. Given a matrix A and two vectors u and v show that 

7 (Au) = (¥A)'u (5.8) 


(Expand both sides of the equation.) 


A= 


CHAPTER 3 


Linear Eouations 


6. Numerical survey 


Problems and questions arising in connection with simultaneous 
linear equations are conveniently introduced by the way of examples 
Consider at first the system of equations | 


at Mie ἢ 
Xe τ Sta ey = 6.1 
2X, — 3χ, -ἰ Txe = 7 is!) 


It is solved by x, = 3, x, = 2, x3 = 1, as conclusivel 

substituting these figures into the equations. The ἐπηκ ἃ pated % 
found by eliminating two of the three unknowns, thus proving that 
the solution is unique. If the numbers on the right-hand sides are re- 
Tope ae numbers the method of elimination is again applic- 

leids a unique answer. In parti i 
are 0, 0, 0 the i IS X; = Xp NE ia ΠΑΡ me Se Re 
The system 


a eens 
X, — 2X_ + 3x3 = 2 6.2 
3X, — 3χς + 7x3 = 10 we 


is again solved by x, = 3, x, = 2, x, = 
native solutions such as x, = 13; te ὦ inp imi Te tee 
to eliminate one of the unknowns but not two of them. If the right- 
hand sides of (6.2) are replaced by 0, 0, 0 the equations are again 
solved by x, = x, = xs = 0 but there are also non-vanishing solu- 
tions; they are not determined uniquely and are merely subject to the 
condition that x,/x, = —0-4 and x,;/x, = —0-6. Thus x, = 10 
a = ae is a solution. ᾿ ; 
ow let the right-hand side of the third equati - 
placed by 8, leaving all the other figures as thee mae Hnincting 
14 
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from the first and second equation and then from the first and third 
equation two equations for x, and x, are found. 

3x, + 5x3 = 14 

3X, + 5X3 = 8 
They are obviously incompatible; the modified equations do not 
admit any solution. 

These examples show that simultaneous linear equations are not 

necessarily soluble and that solutions are not necessarily unique. 


7. Homogeneous equations 


Equations of the form (5.1) or (5.4) are called systems of linear 
equations. Given the matrix A and the vector y they are to be solved 
for the unknown vector x. It will be assumed that A is a square 
matrix, meaning that the number of equations is equal to the number 
of unknowns. If the vector y is a null vector the equations are called 
homogeneous; otherwise they are non-homogeneous. It is attempted 
to establish conditions for the solubility and the unique solubility of 
these systems. 
In considering the homogeneous equations 
Ax = 0 (7.1) 


it is convenient to express the column vector x in terms of its com- 
ponents and to regard the matrix as an array of column vectors; thus 
1X1 + AaXy +... + Antn = 0 (7.2) 
This equation admits the solution x = 0. If equation (7.2) can be 
solved for non-vanishing x the existence of that solution establishes 
a linear dependence of the column vectors, in accordance with the 
definition (2.A). Thus non-vanishing solutions of homogeneous 
systems can be defined only if the column vectors of the matrix of 
coefficients are linearly dependent. Non-vanishing solutions of homo- 
geneous systems cannot be unique. If x =u is a solution then 
x = du is also a solution, 4 being any arbitrary scalar. It is some- 
times convenient to adjust A in such a way that x is ‘normalized’, 
meaning that the scalar product of x with itself is equal to unity. 


A= [| ws 15 + | wg |? +--+ | en PIM (7.3) 
If equation (7.1) is soluble for any x +0 the equation 
ΧΑ = 0 (7.4) 


has also a non-vanishing row vector as a solution. From the solution 
x = u of (7.2) it follows that 


n—1 


Qin = — >, αμίμεμ.) (J=1.-- πὴ (7.5) 


k=1 
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It is tacitly assumed that τῳ +0; this is not ial si 
; essential since u,, and 
a;, can be replaced by a non-vanishin ing 
The matrix elements , Cee 
433, Aja... Qin—1) j= δοὺς n) 
are components of (7 — 1)-dimensional vectors. As their number i 
is 
n, they are, by theorem (2.B), linearly dependent. Thus there exist ἡ 
numbers (v4, v2. . . Ὁ,) satisfying the equations 
n 
vj; = 0 {aS .. n— 1] 
Σ (n -- 1)) (7.6) 


Also, by (7.5) and (7.6) 


n n-1 
>, Pitin = —>. οἱ anlety/ug) (7.7) 


3.1 j=l k=1 
n—1 n 
a3 -Σ (Uj/Un) 7 Vj. = 0 
k=1 j=1 


The subscript in equation (7.6 : 
k=1,2. in Hy Hones (7.6) may, accordingly, take all the values 


Thus it is sh th l ltd ry 
15 Shown that a solution ¥ = Υ of equation (7.4) exists. 
Equation (7.8) can be written in terms of the ace matrix 
Thus th stn 
us the columns of the transposed matrix are linearly de 

- . . nd t; 
as they are identical with the rows of A and as eauhtion (7.1) can be 
written as XA = 0 it is concluded that: 


The rows of a matrix are linearly dependent i 
columns are linearly dependent. \y dependent if and only if the 


As homogeneous linear equations ; 
' play an important part in matr 
algebra the results of this section will be "ΝΑ τα, sb ἑῷ " 


8. Non-homogeneous equations (i) 


The vector equation 
| | Ax =y (8.1) 
can be written in a form similar to (7.2) as 
41%) AgXg +... + antn=V (8.2) 


AS 43, &:... An, —y aren + 1 vectors of n com 

2.1, 3 onents the t 
be linearly dependent: thus there must be n ++ 1 Sones (φ,. Ἵ Ay a) 
complying with the equation eee 


an9i +- eee ΟΝ or YJn+1 = 0 
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If dn+1 #0 then the numbers 91/qn41---4n/Gn+1 are the com- 
ponents of a vector α΄ and x = α΄ isa solution of (8.1). 

It will be at first assumed that the columns of A are linearly 
independent. If the component y, ~ 0 the argument presented in 
Section 2 can be applied by writing a"*” for —y and a" for a,. 
Then the vectors ΒΑ) are determined by equation (2.1) and their 
linear dependence is established as previously. Thus the solubility 
of equation (8.1) is proved. 

If the columns of the matrix A are linearly independent but y, = 0 
the argument still applies but it is necessary to redefine the vectors 
b”. 

Assume that there are two solutions of (8.1) so that Au = y and 
Au’ = y. Then A(u — w’) = 0.(u — w’) is accordingly a solution of 
the homogeneous equation Ax = 0. Since the columns of A are 
assumed to be linearly independent this homogeneous equation has 
the unique solution u — u’ = 0. Hence any two solutions of (8.1) 
must be equal to each other; in other words solutions of (8.1) are 
unique. 

This is the type of equation which is usually considered in element- 
ary mathematics. 


9. Non-homogeneous equations (ii) 


Consider a set of equations of the form (8.1) with different vectors on 
the right-hand side but with the same matrix. It is again assumed that 
the column vectors are linearly independent of each other. It is 
attempted to find the dependence of the unknown vectors on the 
vector on the right-hand side. 

Denoting the right-hand sides of two vector equations by y and z 
and the corresponding solutions by u and v respectively, then 


Au=y, Av=z 
and hence 
Au-+yvy=yt+z 


Further if the right-hand side is Ay the solution is Au and if the right- 
hand side is Ay + yz the solution is Au + py, A and yw being any 
scalar factors. Thus the solution of equations of the type (8.1) can be 
regarded as a rule by which the unknown vectors are derived from 
the vectors on the right-hand side. According to the relations 
listed here this rule corresponds to a linear dependence of a vector 
upon another vector. It can, accordingly, be represented by a matrix. 
Hence it should be possible to specify a matrix B of such a kind that 


By = x (9.1) 
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expresses the solution of equation Ax = y in terms of the vector on 
the right-hand side. In fact, solving equation (8.1) implies finding an 
expression for the matrix elements of B. The matrix B must be com- 
pletely determined by the matrix A, independently of y. Expressions 
for the matrix B will be considered later. 

It is necessary to consider equation (8.1) on the assumption that 
the column vectors of A are linearly dependent. In this case equations 
Ax = 0 and XA = 0 have non-vanishing solutions to be denoted by 
uand ¥ respectively. Let x = z be a solution of the non-homogeneous 
equation so that 


Az=y (9.2) 
By (5.8) and (7.4) 
y'(Az) = (vA)*z = 0 
so that, by (9.2) 
(v'y) =0 (9.3) 
It follows that simultaneous solubility of the non-homogeneous 
equations (9.2) and the homogeneous equations (7.4) restricts the 
admissible vectors y. This restriction is expressed by the equation 
(9.3) which says that column vectors on the right-hand side of a non- 
homogeneous equation must be orthogonal to the row vectors which 
are solutions of one of the homogeneous equations. If that condition 
is not satisfied non-homogeneous equations are not soluble. If the 


condition is complied with, the solutions are not unique since, 
by Au = 0 
Ai + u)=y 
defines an infinite set of solutions. 
In this chapter questions of solubility of linear equations are 
answered. The numerical or algebraic construction of solutions will 


not be considered but some general properties of solutions are 
considered in Chapter 5. 


10. 2 x 2 matrices 


The study of two-by-two matrices reveals on the one hand the 
essential features of matrix algebra and is on the other hand free of 
unnecessary complications. For this reason two-by-two matrices are 
considered in greater detail both in this section and later. 

As these matrices have four elements it will be attempted to 
represent them as linear combinations of four ‘basic’ matrices. 
Selection of these matrices is somewhat arbitrary, but the unit matrix 
is an obvious first choice. In addition three matrices are adopted 
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which are applied in various fields of theoretical physics. The basic 
matrices are 


ΚΣ eB] ΚΤ 


X, Y, and Ζ are known as ‘spin matrices’. Every matrix in two 
dimensions can be written as 
A=pl+qX+r¥+ 52 (10.2) 
where p, 4, r, 5. are numbers. | 
In snler to apply (10.2) it is necessary to determine p, q, r and s in 
accordance with the simultaneous equations 
p—s=a, —igt+r=an 
iq +r = yp P+rs= 4x 
It follows that x 
= s= — @ 
P = ξ(α;ᾳ + ἀν) (dog ass 
r = ξ(α,5 + a1) a= —Hi(@y, — a1) 

i i i ts which is 
Consider now an important function of matrix element 
called the determinant of the matrix. It will first be discussed in 

connection with 2 x 2 matrices. 


The determinant is defined as 
dot | 2 Πῶς Gidea — Gist (10.4) 
Ax, Age 


Important properties of determinants can be verified by the use of 
equation (10.4), for example 


τῶν ἊΣ 10.5 
Ady, Gyo) ,16ῃ Ae (10.6) 
sy Gog| [ἀμφ Ang 
(αι + 42) Ge} _ | 41 ἄμ (10.7) 
(Go; + Go) Gee Go, Are 


Equation (10.7) shows that determinants of different matrices may 


be equal to each other. a 
The concept of determinants would have to be regarded as artificial 
and abstruse were it not for their relations to the solution of linear 


equations. Consider the system 
AyXy +  αιοχς = Vi (10.8) 
ἄφιχι + AgoX2 = Vo 

and assume that it admits a unique solution. Multiplying the first 
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equation by Mn, the second Ὁ and th i ‘mi 
yielding an equation for x,: ie Dr tr: ate 
(411499 re Qo) ΧῚ Ξε V1422 -- 
Are » αι, (10.9) 
Multiplying the first equation Ὁ 
tracting eliminates x, and yields an equation for — iat ὑτν 


(αι ας: — Ay1A29)X_ = ¥1421 — Voy (10.10) 


Equations (10.9) and (10.10 i ἱ 
PE sige oh ) an ( Ἂ ) can be written, and accordingly solved, 


και Aap) "" Ne 

Gq, (99 Ye Age 
(10.11) 

x, |) Ma) On I 

Go, (59 Go, Ve 


From these examples it emer 
ges that determinants may be 
important with respect to the solution of simultaneous pias ἔν 


For this reason in the followi : 
; ng chapter mi 
matrices are considered. δ chapter determinants of general 


EXERCISES 
1. Given the matrix 


δ᾽, ee 
A=/| 3 -4 4 
— ae 


and the vector y having the com 
ponents —5, 1, 4 show that 
equation Ax = y is solved by the vector (1, 2, 3) and also by he 
SS ἀὐθοῖ, “ as panes a solution of the equation Ax = 0 
; € traces and determinants ἱ 
3. Solve by elimination rf Ae ai 
5X1 a 3X => 1 
2x, — 4x, = —1 


Show that thi . ᾿ ᾿ 
(10.11). 5 agrees with the solution obtained by applying equation 


CHAPTER 4 


DETERMINANTS 


11. Definition and properties 
The determinant of a 3 x 3 matrix D is defined as 


Ἔ dy 3dods9 ot Ay, dogd30 aS Ay odo dys 

τ disdoody 
These expressions are related to equations with three unknowns in a 
way comparable to (10,11) but the necessary deductions are cumber- 
some. For matrices of 4th or higher order any direct definition is hard 
to appreciate and even harder to evaluate. For this reason the general 
definition of determinants will be given implicitly in terms of their 
most important properties and from this indirect definition all those 
relations can be derived which will be required later. 


DEFINITION (11.A) A determinant of nth order is a scalar function of 
ann X n matrix, specified in terms of column vectors and subject to 
three conditions: 


(x) The determinant of any unit matrix is equal to unity. 

(8) If a matrix B is derived from a matrix A by multiplying a 
column vector by a scalar factor A then det B = / det A. 

(y) If a matrix C is derived from a matrix A by adding a column 
vector to another column vector then det C = det A. 


These definitions are generalizations of equations (10.5), (10.6) and 
(10.7). 

Other properties of determinants are readily derived from the 
definition. 
THEOREM (11.B) If a matrix B is derived from a matrix A by adding 
a scalar multiple of a column vector to another column vector then 


det B = det A. 
21 
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By (8) det A becomes 4 det A if the column vector a, is multiplied 
by 4. This value persists if Za, is added to a, on account of (y). Sub- 
sequent multiplication of 4a., by 1/A reduces the column vector and 
the determinant to their original values, whereas the vector a., has 
been replaced by a., + Aa... 


THEOREM (11.C) If two columns of the matrix A are permuted, the 
determinant of the resulting matrix Ὁ is det D = — det A. 


From A new matrices are derived by successive substitutions, 
’ 


a, = a, +a, a, =a, 


By (y) the determinants of the resulting matrices are equal to det A 
Whereas the column a., is replaced by a. and the column a., is re- 
placed by —a,. Subsequent multiplication of column a,’ by (—1) 
completes the permutation of the columns and multiplies the deter- 
minant by (—1). 


THEOREM (11.D) If any column in a matrix is a null vector the 
determinant is zero, 


Multiplication of any column by the scalar factor Δ <0 must, in 
accordance with (8) multiply the determinant by 2. Multiplication of 
the null vector by 4 does not change the column and, accordingly, 
could not change the determinant. These two statements are in- 
compatible unless the determinant vanishes. 


12. Evaluation of determinants 


In spite of its indirect approach the definition (1 1.A) can be used for 
deriving the numerical value of determinants, 


THEOREM (12.A) The determinant of a diagonal matrix is equal to 
the product of the diagonal elements. 


Beginning with a unit matrix which has the determinant unity, the 
column vectors and hence the diagonal elements are successively 
multiplied by ay,...@n,, the diagonal elements of the diagonal 
matrix A. Hence the proposition is proved. 

Consider now a matrix 


B 71 bay bse bys “+. (12.1) 
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in which all elements to the right of the diagonal are zero (bj, = 
a Fi to find the determinant of B the last column is multiplied 
by (—8n1/Pnn) and added to the sepia, ἐπ" tie ton edi 
ithout affecting any other matrix element. By a 

he elements i - . .» byn—1y can be replaced by 0 without any 
other matrix element being changed. In the same way, all terms in εἰ 
(n — 1)th row left of the leading diagonal are replaced by 0 an 
eventually all matrix elements to the left of the diagonal are made to 
vanish without changing the diagonal elements. The determinant 1s 
not affected by these substitutions; it must accordingly be equal to the 
determinant of the diagonal matrix to which B is finally converted. 
This determinant is equal to the product of the diagonal elements of 
the diagonal matrix and, accordingly, to the product of the diagonal 

ts of B itself. ah 
thas it is possible to evaluate the determinant of any matrix which 
can be converted to the form (12.1) by linear combination of ἜΣ 
For this purpose the matrix A) is in succession converted to A®, 
A)... B by the substitutions 

(1) ,(1) 


G5; Qik | 
a® = a — aa | = 2...7) 


αἴ αἴ 
(3) — gin — | FG) ὼς... n) (12.2) 
ain = Bie | (2) ( ole 
Ag9 
—1) ,zin—1) 
(n—1) Din—VAn— 1)k (k =n) 

by, = Aj os ας} 

(-ς1)0.-1) 


Before performing any of these substitutions it is necessary to make 
sure that the diagonal element in the denominator does not vanish. 
If it is zero, the substitution is preceded by a permutation or a linear 


_ combination of columns in order to replace the zero in the diagonal 


. If any of the columns is converted to a null vector the value 

of te oaiemee is identified as — τὰ this method the deter- 
matrix is uniquely defined. 

” elnesie laa of AM sched on the elements of Αὐτῷ in the 
columns k > r and j > r. Expressions for aj’ and aj} depend on the 
elements of A’ in such a way that they are converted into each 
other by permuting j and k. This remains true if they are expressed = 
terms of the elements of Αὐτῷ, AC), Diagonal elements of 49 
and, in particular, of B depend on the OWS and columns of A® in 
an entirely symmetric way. As the determinant of A“ depends on 
the diagonal elements 5,; only it is not changed by transposing: 


det A = det A (12.3) 
For the same reason the term ‘column’ can be replaced by ‘row’ in 
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(11.A), (11.B), and (11.C) and in the results to follow whenever 
properties of determinants are considered. 


13. Expansion of determinants 


The way in which determinants depend upon the matrix elements is 
conveniently studied by the way of an expansion 

det A = 441011 + τοι +... + Ann (13.1) 
the validity of which is to be established in this section. The co- 
efficients αι; are called the cofactors of a, and depend upon matrix 
elements outside the first row and the kth column. By deleting aj. 
and a, a matrix with one row and column less than A is defined. 
The determinant of this matrix multiplied by (—1)* is the co- 
factor of a,;.. The expansion (13.1) can, on account of the symmetric 
dependence of determinants on rows and columns be converted to 
an expansion progressing in elements of the first column. 

Consider at first a matrix the last column of which consists of 
variables aj, = x; whereas the other matrix elements are fixed num- 
bers. If the substitutions (12.2) are performed the variables x; are 
grouped into homogeneous linear functions in which every term 
carries a factor x;. If all matrix elements to the right of the diagonal 
are replaced by zero the variables x; are assembled in the matrix 
elements b;;, whereas all other matrix elements are independent of the 
variables. The determinant is, accordingly, a linear function of 
X,...X,. Thus determinants are linear functions of the elements in 
the last column. However any column can be interchanged with any 
other column of the determinant without affecting the value of the 
determinant, but changing its sign. The following theorem is there- 
fore proved. 


THEOREM (13.A) Determinants are linear homogeneous functions of 
the matrix elements. 
It follows that a determinant can be expanded as follows: 
det A = ay19; + ἄγοι +--+ Gingn 
where the coefficients are independent of the matrix elements of the 
first row. 


In order to identify the factor g, consider a matrix A@ in which 
the first row of A is replaced by (100... 0). This is converted by the 


substitution 
(1) ,,{1) 
a?) = qv — G1 Dik 
ik — Az ag) 
ll 


into a matrix in which the elements of the first row and the first 
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column vanish, excepting aj? = 1. The sub-matrix which remains 
after deleting the first row and the first column involves those matrix 
elements upon which g, depends. Now 
det AM = det AM = q, 

q, temains accordingly unchanged if one column is added to another 
out of the columns 2 to ἡ. Also if any of these column vectors is 
multiplied by any scalar factor, g, is multiplied by this scalar. q, 
would be equal to unity if the matrix A® and, accordingly, the sub- 
matrix, were a unit matrix. It follows then that g, as a function of the 
matrix elements of the sub-matrix must be a determinant; since this 
function is uniquely defined g, must be equal to a). 

In identifying the factors go, gg... Gn the first row of A is succes- 
sively replaced by a row in which the 2nd, 3rd... mth element is 
equal to 1, the other elements being equal to 0. The second column is 
permuted with the first column and the above argument is repeated ; 
remembering that the permutation changes the sign of the deter- 
minant, gg is identified as «,.. The third column is permuted with the 
second thus involving another change of sign and allowing 44 to be 
identified with «,3. Continuing in this manner, all factors g, are seen 
to be equal to the cofactors. 

After having established the validity of the expansion (13.1) this 
same expansion can be applied to the cofactors. It is not necessary to 
present any expansion formulae explicitly. It is sufficient to notice 
that by continued application of the series (13.1) every determinant 
can be derived from determinants of 2nd order which are explicitly 
defined in Section 10. 

If in equation (13.1) the matrix elements a, are replaced by do, 
the series is equal to the determinant of a matrix in which the first 
row is equal to the second row and which vanishes accordingly. The 
same is true if in (13.1) the first row is replaced by any other row: 


Qj %q4 + Aje%19 + eee + GinFin — 0 (j σα 2, 3 see n) (13.2) 


14. Proof of consistency 


This section is only indirectly linked to the other parts of the book. 
It is added for the benefit of those readers who insist on rigour in 
mathematical deductions. 

The arguments of Sections 11-13 are established on the tacit 
assumption that the conditions («), (8), and (y) of (11.A) are mutually 
compatible. This is not self-evident; construction of determinants by 
applying the expansion (13.1) could fail. 

It will now be shown that the validity of the three conditions of 
matrices of order ἢ — | implies their validity for matrices of nth 

B 
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order. Their validity for 2 x 2 matrices has been demonstrated in 
Section 10. 

The existence of (nm — 1)th order determinants is assumed and 
it is shown that the right-hand side of (13.1) complies with the three 
conditions (11.A). 

Inspection of equation (13.1) shows that condition (A) is complied 
with: any scalar multiplier of the row a,. multiplies det A. Every 
scalar multiplier of any other row multiplies the cofactors and hence 
det A. If A is a unit matrix, αι, and «,, are equal to unity and the 
other terms in the series vanish. Hence condition («) is complied with. 
Validity of condition (y) is obvious as far as rows 2 to n are con- 
cerned, since these rows belong to the cofactors. It remains to prove 
that addition of the first row to the second or of the second row to the 
first does not change the determinant; if this proof is successful it 
also applies to addition of the first row to all rows from 3 to n. 

Expanding the cofactors «in the elements of the second row 


det A= > > ayauBn (14.1) 
j k 


where the coefficients are independent of the elements in the first and 
second row. +f, is equal to the determinant of the matrix derived 
from A by deleting rows 1 and 2 and columns jand k. Bj; = 0 since 
the cofactor of a,, is independent of the column J; Brs and B,, differ, 
if at all, only by their sign. Thus (14.1) can be written as 


det A = S S βμάμας, + s 3 BQ jor, 
j=2 k=1 


j=2 k=j+1 


+ an > Bird (142) 

k=2 
Counting the number of columns in the cofactors from the lowest 
to the kth inclusive, one finds in the first term of (14.2) k columns, 
in the second term k — 1 columns, in the third term k — 2 columns. 
As the sign of the cofactors is (—1)!, the sign of ,; is (—1)!+* in 
the first term, (—1)’** in the second term and (—1)/+*- in the third 


term. Hence β,, = —,, and (14.1) can be written as 
n n—l 
det A = > ἊΣ» β,.(αι,αςς --- yoy) (14.3) 
r=2 s=1 


This expression is not changed if the first or the second row—but 
not both simultaneously—are replaced by the sum of the two rows. 

This result completes the induction proof; the compatibility of 
the conditions (11.A) and the existence of determinants of every 
order is thus proved. 
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EXERCISES 
1. Evaluate the determinants 
4. 2 1 —3 -- 2 a 
13 2 2! and 1 —1 -1 4 
=e 6S 2 l 0 2 
| | ae 


2. Show that the matrix 


2 3-: 4 Si 
: — 4 3 1 
—Si 1 -ἰ 

has a real determinant. | 
3. Using the theorems of Section 11, show that the determinant 
of a matrix must vanish if two rows are equal to each other. 


4, Using the definition of cofactors, the expansion (13.1), equation 
(12.3) and theorem (11.C) show that the expansion 


n 
det A = ΝΣ κῶς 
k=1 


is valid and hence that 
n 


yo Aim = ὅρη det A 
j=1 
5. Show that repeated substitutions of the form (12.2) convert the 
matrix 


A = 
= oe 
to 
Ὡς ἢ 
ΓΕΒ ΝΗΝ ΘΗΝ 
mls Of oa oe 


Ι ΟΡ ate 


and thus derive det A. (The matrix obtained by permuting the 2nd 
and 4th column of A would yield after the first substitution a 
vanishing diagonal element although the determinant does not 


vanish.) 


CHAPTER 5 


Marrices ano Linear Eouations 


15. Determinants and homogeneous equations 


Consider a matrix A with columns which are linearly dependent. 


In this case 
ΝΣ ἀπ, πες 


r 


not all x, being zero; if in addition X30 


a= ἊΣ 8. (χρ κι) 
r=2 


By successive addition of all a.,(x,/x,) to the vector a.,, this vector is 
converted to a null vector so that the determinant vanishes. If 
x, = 0 this can be proved after a permutation of columns. 


THEOREM (15.A) If the column vectors of a matrix are linearly 
dependent the determinant of the matrix vanishes. 


On account of the argument in Section 7 the term ‘row vectors’ 
may be substituted for ‘column vectors’ in (15.A). The inverse of 
theorem (15.A) says: 


THEOREM (15.B) If the determinant of a matrix vanishes, the columns 
are linearly dependent. 


Consider an ( Χ n) matrix B with columns that are linearly 
independent. By permutation of columns it is then possible to make 
ἄμ ~0. By a substitution of the form (12.2) the elements big. . - Diy 
are replaced by 0. The sub-matrix derived from B by deleting the 
first row and column must have linearly independent columns. For 
if, for non-vanishing gq, 


n 
> bnge=0 {{-.2.. ἡ 
k=2 
28 
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it would be possible to choose g, = 0 and to establish the linear 


dependence 
> big, = 9 


k=1 
contrary to premises. 
In the present instance 
det B — by Buy 
B,, being the cofactor to δ... If it is assumed that it has been proved 
(15.B) is valid for (n — 1) x (n—1) matrices this cofactor is 
non-zero. As b,, + 0 this would show that det B #0. Thus validity 
of theorem (15.B) for matrices of any order would imply the validity 
of theorem (15.B) for matrices of the next higher order. Thus it re- 
mains to show that it is valid for 2 x 2 matrices. 
Assuming that the determinant vanishes, 
A449 — A424, = Ὁ 
it follows that 
(441/21) = (412/22) = y 
Then the matrix has the form 
ee val 
So that the linear dependence is established 
ay. = Varo = 0 
Theorem (15.B) is accordingly true for second-order matrices, and, 
on account of the above induction, for matrices of every higher order. 
On account of the results of Section 7 one arrives at the following 
general conclusion. 
THEOREM (15.C) A homogeneous system of linear equations 
Ax = 0 
admits solutions x ~ 0 if and only if det A = 0. 


From this theorem the most important deductions in matrix 
algebra are derived. 


16. Products and powers of matrices 


Multiplication of a vector by a matrix generates another vector. 
This vector may be multiplied again by a matrix. Thus if. 
Bu=v and Av=w 


the resulting equation 
ABu = w 
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says that the ‘product matrix’ AB multiplies the vector u and gen- 
erates the vector w. In terms of components this comes to 


Σ bys; = Uz μὰ Any = Wm 
j k 
3 


Hence the matrix element of the product matrix C = AB is given by 
Cne = >, Gnsbre (16.1) 
j 


The subscript j is a ‘dummy’ and does not enter in the result. Equa- 
tion (16.1) is an expression for the rule of matrix multiplication. 

As the multiplication of vectors by matrices can be repeated any 
number of times and its result is unambiguous, matrix multiplication 
should comply with the associative law 


A(BC) = (AB)C (16.2) 
This is verified by applying equation (16.1). Matrix elements of BC 


are given by 
BY bimCmp 
m 


so that the left-hand side of (16.2) is 


>, Oy, > ie (16.3) 


The matrix elements of AB are given by 


= Aj Dicm 
k 


so that the right-hand side of (16.2) is 


2 ayn >, Pimms (16.4) 


Since all summation signs can be shifted to the left of the matrix 
expressions (16.3) and (16.4) are equal to each other and con- 


veniently written as 
> > B51 imCmy 
kom 


thus proving the associative law. 
The commutative law of multiplication does ἱ 
not apply to matrices. 
The products AB and BA need not be equal to each other. If in a 
particular instance the two products are equal the matrices A and B 
are said to ‘commute’. Every matrix commutes with the unit matrix: 


AI=IA=A (16.5) 
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Powers of matrices are explained in terms of repeated multiplication. 
A? has the elements >; aja», and Α3 = A?A = AA?®. 

The product of two matrices may be the null matrix even if both 
factors differ from zero. This is an important difference between the 
algebra of numbers and of matrices but it is not of great importance 
in the applications to physics given in this book. 

By the rules of addition and multiplication matrix polynomials are 
defined which behave in many respects like polynomials of numbers. 
Thus 

P — 51+ 4A — 3A? + 2A% 


is a matrix with the elements 
Pim = 504m + 404m — 3) QjxQem + ἊΣ 2 Aj. 1m 
- k m l 


If the necessary conditions of convergence are complied with, it is 
possible to define transcendental functions of matrices by way of 
infinite series, for instance 


cA =1+4 > (1/n)a" 
n=1 
However, if A and B do not commute, then e* e® is not equal to 
“ων. 
From C = ΑΒ it follows that 


k k 


= Ἃ δικάκηι 
k 


Hence δ. BA (16.6) 

In a similar way it can be shown that the adjoint of a matrix 
product is equal to the product of the adjoint matrices taken in the 
inverse order 

(AB) = ΒΊΑΙ (16.7) 

At this stage the question arises whether a division of a matrix by 
another matrix can be defined. Given a matrix A it might be possible 
to identify another matrix B which complies with the condition 
BA = I. If this matrix exists it would conveniently be denoted by 
A-! and called the reciprocal (or inverse) of A. If B also has a 
reciprocal then 

B-'BAB = B-'B= I 

and hence AB = I so that there is a pair of mutually reciprocal and 
commuting matrices. In this way negative powers of matrices are 
defined. 
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Assuming that matrices B and C are given and B has a reciprocal 
then division of C by B would have a meaning because X = CB-! 
would satisfy the equation XB = C. 

The reciprocal of a matrix product is given by 

(AB)-! — B-1A-1 (68) 
This follows from 
(B-'A-")(AB) = Β-ῚΒ = B-'1B= 1 

Apart from the rule (16.1) a different kind of multiplication of 
matrices has been defined and is occasionally used. 

Given ay x ν matrix Panda x μ matrix Q the ‘direct product’ 
of these matrices is defined as an yu x vu matrix R 

R=Px@Q 
Every element of R is the product of an element of P and an element 
of Θ. R consists of sub-matrices in which every element of P appears 


in the right order and is multiplied into one and the same element of Q. 
In general Q x P differs from P x @ 

However, ordinary multiplication and direct multiplication com- 
mute. If P and S are » x » matrices and @ and T are μχμ 
matrices and R= P x Q@, U=S x T then 


RU = (P x Q\(S x T) = PS x QT (16.9) 


This is proved by evaluation of the product in terms of matrix 
elements. As the proof is lengthy, but otherwise trivial, it is omitted. 


It may be noticed that the direct product of diagonal matrices is a 
diagonal matrix. 


17. Equations admitting unique solutions 
Consider an equation 
Ax=y (17.1) 


and let the columns of A be linearly independent. According to 
Sections 8 and 9 equation (17.1) is uniquely soluble and the solution 
can be given in the form 

x = By (17.2) 


where the matrix B is independent of the vector y. Equati 
and (17.2) imply that y. Equations (17.1) 


BAx = By cree aes 
and hence that 
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The matrix B is, accordingly, the reciprocal of A. If the determinant 
of a matrix differs from 0 the matrix has certainly a reciprocal. 

If det A= 0 then the equation Ax = 0 is soluble for x σεῦ. 
Then BAx = 0 for any matrix B. If B were the reciprocal of A then 
BAx = Ix = 0 with x 0. This is impossible. It follows that A 
cannot have a reciprocal. 


THEOREM (17.A) A matrix has a reciprocal if and only if its deter- 
minant does not vanish. 


This theorem is analogous to the exclusion of division by 0 in the 
algebra of numbers. 

The elements of the reciprocal matrix of A are determined by the 
equations 


Ἔ α,καρ 7 ἢ D5m 
k 
which can be solved by comparison with the expansion 
> ἄμα, = (det A)Sym 
k 
Aint = (det A) apm (17.3) 
The matrix A~*(det A) is called the ‘adjugate’ to A and is equal 
to the transposed matrix of the cofactors. The reciprocal matrix can, 
accordingly, be expressed in terms of the determinant of the matrix 


and the cofactor of each matrix element. 
The solution of (17.1) can also be expressed in terms of the co- 


factors. 
x= > Gi. Vi: = (det ΑἿΣ anys) 
k 


k 
= (det A> (να) (17.4) 
k 


According to exercise 4, Chapter 4 the last sum is equal to the 
determinant of a matrix derived from A by substituting y for a, or, 
equally, from the transpose of this matrix. The solution of (17.1) is 
then written as 


Qin +++ ἄχ, -ὸῪ Va =A) - - - Gin 


so that 


x; = (det A)~? (17.5) 
: : | 
| Onn »++Gn—-1) Vn ἀρ)... Gun | 
This formula is known as Cramer’s formula for solving linear 
equations. It is obviously restricted to matrices with non-vanishing 
determinants. A simple application is shown in (10.8). 
ΓΝ 
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Solutions of linear equations can, accordingly, be frequently 
expressed in terms of determinants. The use of this representation 
is limited since the evaluation of determinants is hardly less cumber- 
some than the conventional solution by elimination. 


18. Determinant and trace of matrix products 


The determinant of a matrix product (AB) is, by (13.A) and (16.1) a 
polynomial in the matrix elements of A and B. Whatever the co- 
efficients of this polynomial may be they must not depend upon the 
numerical values of det A or of det B. In investigating the form of 
the determinant we are free to make any arbitrary assumption 
regarding det A or det B. 

Thus let det B be zero. By (15.C) the equation Bx = 0 has non- 
vanishing solutions. The equation ABx = 0 accordingly also has 
non-vanishing solutions and, by (15.C), det (AB) = 0, for every 
matrix A. This is possible only if det B is a factor of det (AB). 

Let now det B 0 and det A= 0. Then the equation Bx = y 
is uniquely soluble for every y. This vector may be chosen in such a 
way that it solves the equation Ay = 0 without being a null vector. 
It follows then that 


ABx = Ay = 0 
for x τέ 0 and therefore det (AB) = 0. Therefore det A must be a 
factor of det (AB). Thus 
det (AB) = A(det A)(det B) 
where A is a scalar multiplier and independent of the two determi- 


nants. By setting A = B = AB = I this factor is identified as being 
equal to unity. 


THEOREM (18.A) The determinant of a matrix product is equal to the 
product of the determinants of the individual matrices. 


It follows that 
det (AB) = det (BA) 
even if AB + BA 


The trace of a matrix product (AB) is the sum of the diagonal 
elements, i.e. 


tr (AB) = > aubu, (18.1) 
j,k 
᾿ Similarly, 
tr(BA) = > bray (18.2) 
i,k 
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(18.1) and (18.2) differ from each other only by the symbols used for 
the dummy subscripts and must, therefore, be equal to each other. 
tr (AB) = tr (BA) (18.3) 
even if AB = BA 
Up to this stage the concepts of vectors and matrices were used in the 
first instance as auxiliary devices in the theory of equations. Results 
obtained will now be used for a further study of the properties of 
matrices. 


EXERCISES 


1. Given the matrices 


᾿ Ὁ 4 i: ae. 
A={2 1 2] and B=|0 20 
ei 2 ΕΝ Ἢ 


calculate ΑΒ and BA. Show that the traces and the determinants of 
the products are equal. 
2. Given the matrices 


l 0 2 δ, ἢ Ὁ 
ς --᾿ 1 --ἅ} and D=10 O 1 
2 —1 Ι θ. 4 


derive their squares and cubes. Show that det (C*) = (det C)°. 
3. Prove that the determinants of reciprocal matrices must be 


reciprocals. 
4, Show that 


cosh a + sinha 0 0 
et? — 0 cosha sinha 


0 sinha cosha 
where D is defined in exercise 2. 
5. Applying matrix multiplication show that 
-} l l 3 
1} 3 - | Ι 
81: ἢ 3 --ἱ Ι 
Ι 1 3 --Ἱ 


is the reciprocal of 
se. μὰ. 


S -τᾶν 22 al 

LC 2h ἃ 

a Ee SO ἘΞ 

6. Verify the validity of the rule (17.5) by applying it to equations 
(6.1). 


CHAPTER 6 


TRANSFORMATIONS 


19. Definition 
From now on the multiplication of a matrix into a vector will no 
longer be associated with the solution of equations. In the relation 
y = Ax (19.1) 

the matrix represents a function by which the independent vector 
variable x is converted into the dependent variable y, a ‘linear vector 
function’. We will occasionally write (Ax) with a bracket as a 
column vector without changing the meaning of A or x. 

The scalar product of a row vector Z and a column vector Ax is a 
scalar function of two vectors and called a ‘bilinear form’ 


O(z, x) = 7°(Ax) = (ZA)*x (19.2) 
In terms of components 


Oz, x)= > > Fans 
j k 


The simplest bilinear form is the scalar product of two vectors which 
can be written as Z(Ix). 

Expressions of the type (19.1) and (19.2) are used frequently. Their 
= depend upon relations between the elements of the matrix 


The description of any physical object may involve a number of 
vectors X,, X,.... By means of a matrix T an alternative set of 
vectors is defined by 


and may be used for describing the same object. Then all mathe- 
matical relations involving the original vectors are converted into 
relations involving the vectors x;. The relation between the two sets 


of vectors has the form (19.1) but does not establish any relation 
36 
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between different quantities but rather between different specifica- 
tions of the same vectors; this is called a transformation. If the x, 
and x; are space vectors a transformation is associated with the 
transition from one to another set of coordinates. 

Transformations are of particular importance in studying the 

roperties of functions of vectors. In transforming the vectors in 
(19.1) or (19.2) it is necessary to transform the matrices simul- 
taneously. 

Thus the transformation of equation (19.1) (with x = Tx’, 


y= Ty’) 


Ty’ = ATx’ 
assuming that det T ~ 0, gives 
y =\A'x 
where A’ = Τ ᾺΤ (19.4) 


The transformation (19.3) of a column vector implies that row 
vectors are transformed by 


%=2'T (19.5) 
so that 
O(z, x) = (2'T)"(ATx’) = Σ΄ (TtATx’) = 2°(A’’x’) 
where Α΄ — TtAT (19.6) 


Equations (19.4) and (19.6) define ‘collineatory’ and ‘congruent’ 
matrix transformations respectively. Both depend on the use of a 
transformation matrix T. They can be inverted by the use of Τοὶ 
and Tt respectively. 


Examples 
1. If the transformation matrix is a diagonal matrix the result of 
a collineatory transformation A’ = Τ ΊΑΤ has a particularly simple 
form: 
, 
Ain, = ἀμάίμε 1.) 
2. Permutations or linear combinations of matrix elements are 
arrived at by suitable congruent transformations A’ = T'AT. 
If ἐμ Ξε ἔμ Ξε θ, tue ΞΞ ἔχη ΞΞῚ, 
ty ΞΞ ὃ GAAS AK, LAL, LAK) 
then 
iy = ἄμε; Fin = ἄμε; Axe = Ana Din = Are Ay = ἀμ 
If tan = tix = tae = tne = Ι, ti = δι as before 
then 
Any = Fy = Ape = Ain = My Fee TF Ane Aen Gi, = Ay 
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20. Special types of matrix 


" O(x, 2) = %°(Az) = (Atx)*z (20.1) 
This is verified by expressing the right-hand side of (20.1) in terms of 


components 
>, > Ghd’ = > > sanz, 
Ej 7 κα 


which is equal to the left-hand side. Speci 

16 . Specific properties are to be 
expected of those bilinear forms in which the Ἰὰς σὰ may operate 
ad libitum on either vector. This is possible if 


| | | A'=A (20.2) 
Matrices complying with the condition (20.2) are called Hermitean. 
In terms of components (20.2) demands that the diagonal elements 
of A be real and that corresponding non-diagonal elements are the 
conjugate complex of each other: αὐ; = a;,. If A is Hermitean and 
z = x then 

O(x, x) = %"(Ax) 
and is accordingly written without brackets as 
O(x, x) =" Ax (20.3) 
Q(x, x) is called a Hermitean form and is real, even if 
- x has c 
components. If A in (19.1) is a Hermitean matrix then aT he 
y = (0/0x")(¥" Ax) 
meaning that every component of y is derived from a linear f 
. . . . orm b 
differentiating with respect to the corresponding components of x". 
κ' ἐπ eae i a Hermitean matrix are real it is called real- 
etric an 6 correspondin x, x) with 
called a quadratic form. Ε ἘΟΡΡΉΣΕΘΚΉΝΣ ἩΕΝΕΝῚ ΚΝ 
Apart from the Hermitean matrix another t ix i 
( ἶ : of matrix is 
important in connection with matrix Scales fbemnatanite The trans- 


formations (19.4) and (19.6) become equal to each 
; : th 7 
formation matrix satisfies the momen ne eee 


| 7 Ut=— U-! (20.4) 
Matrices of this type are called unitary and can be defined by 
UIU=I (20.5) 
or 
ὺ Uitlim = > Wisin = Om (20.6) 
k 
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The column vectors of unitary matrices are accordingly normalized 
vectors and are mutually orthogonal: 


Dias 2=1 > witem =O Οἱ γέ πὴ 
k k 


If the elements of a unitary matrix are real the matrix is called real- 
orthogonal. 

THEOREM (20.A) If in a unitary matrix all the non-diagonal elements 
of any column vanish the non-diagonal elements of the corresponding 
row must also vanish. 

Let u;,; = 0 for a fixed j and k + j. Then by (20.6) 

>, wits = | uss |? = 1 
k 
This implies that uj; + 0, but, by (20.6) for ἰ # j 
UjiMar = ἧμιν! = Ὁ 
k 
This implies that uj, = 0. 

Subsequently we shall be concerned almost exclusively with 
Hermitean and unitary matrices; other types are of less importance. 
One should, however, mention complex-orthogonal matrices. They 
are not unitary; a matrix C is said to be complex-orthogonal if 


ς--ε-: (20.7) 
or > crim = On (20.8) 


k 
A relation between Hermitean and unitary matrices is established 
by the following theorem. 


THEOREM (20.B) If A is a Hermitean matrix then V = e'* is unitary. 


Proof 
V =1+4 > GA)"/n! = el 
n=1 
and Vi = 1+ > (ΙΑ) πὶ = 0 
n=1 


As all powers of A commute with each other and with I the series 
defining V and Vt can be evaluated as though the matrices were 


ordinary numbers. Hence 
Vvt =e4e-4 =I 


so that V is proved to be unitary. 
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The determinant of any matrix G can be written in terms of a 
modulus m and a phase ¢: 


det G = me’# 


The determinants of the transposed, adjoint and reciprocal matrices 
are then 


detG=me#*  detGt=m οτν 
det G-! = (1/m) e-# 
For an Hermitean matrix 
mel* = mes 
so that ¢=0,2 
For any unitary matrix 
me? — (1/m) e-i¢ 

so that |[m|2?=m=1 
For any real- or complex-orthogonal matrix 

μι οἷ = (1/m) e-i# 
so that φ-- 0,π and m@=m=1 
THEOREM (20.C) It follows that the determinant of any Hermitean 
matrix is real, the determinant of a unitary matrix is any complex 


number of modulus unity, and the determinant of any real- or 
complex-orthogonal matrix is equal to +1. 


THEOREM (20.D) The product of two Hermitean matrices is Hermi- 
tean if the factors commute. 


Let ΑἹ -- A, Bt = B. By (16.7), (AB)t = BtAt — BA. Hence 
(AB)' = AB if AB — BA. 


The product of non-commuting Hermitean matrices is not 
Hermitean. 


THEOREM (20.E) If A and B are Hermitean matrices the product 
C = BAB is also Hermitean. 


Proof 
ς! — BIA'Bt — BAB—=C 
THEOREM (20.F) The product of unitary matrices is unitary. 


Let U and V be unitary so that Ut = U-1, yt — y-1, By (16.7) 
(UV)! = ντο! = V-1U-1, Then (UV)(UV)t = UVV-1U-1 — I, 
proving the unitary property of the product. 
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THEOREM (20.G) The product οἵ complex-orthogonal matrices is 
complex-orthogonal. ra 
Let C, D be complex-orthogonal so that C= C-*, D=,.D-". By 
(16.6) (ED) = DE = D-1C-}, Then (CD)(CD) = CDD-!C-" = I, 
proving the complex-orthogonal property of the product. 
Important properties of matrices which are connected with trans- 
formations will be considered in the next section. _ 
If U is a real-orthogonal matrix in three dimensions then 
det U = +1 
and 
Ui, + εἷς + τῆς = 1 Uyytye + Ueilag ++ Ugillgg = Ὁ 
Ud, + μῶν + τς = 1 Uygllyg + Uagllos + Usalg3 = Ὁ 
Us, + τς + Uy = 1 Uggla, + Uogtler + Usa, = Ὁ 
Assuming that the determinant is positive and applying the 
expansions (13.1 and 13.2) nine further equations are obtained. If 
the co-factors are denoted by w,;, we have 
Σ μκῶμ, τὸ On (fj = 1, 2, 3) 
P (7 = 1, 2, 3) 
For example, 
Uyy@qy + UygDyq + Uygig = 1 and Wyo + Uy ggg + UigMag qj 0 
As the determinant of the coefficients will not vanish these equations 
can be solved with the result that ὡς» = uj,. If 
det U = —1 (20.9) 


ὧς = —Ujx (20.10) 
The spin matrices X, Y, Z as obtained in (10.1) are Hermitean and 
obey the following multiplication rules which are readily verified: 
XY = —YX= - ΙΖ 
YZ = —ZY = —iX (20.11) 
ZX = —XZ = —iY 
These matrices are ‘anticommutative’. One also has 


the solution is 


ΧΆ --Ὑ -- Z?=— | (20.12) 
If the exponential functions are defined by their sines, 
e** — (cosh a)I + (sinh a)X; (20.13) 


similar formulae apply to the exponential functions of Y and Z. 
(20.4) and (20.5) are invariant with respect to collineatory trans- 
formations. ; : . 

Anticommutative matrices of higher dimensions can be derived 
from the spin matrices by direct multiplication. For instance I x X, 
1 x Y are 4-dimensional anticommutative matrices. 
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21. Invariance under transformations 


Matrix elements and vector components are changed under trans- 
formations but there are functions of these quantities and relations 
between these quantities which do not change and are said to be 
invariant under the transformation. 


THEOREM (21.A) If A’, B’, C’ are the collinear or congruous trans- 
forms of A, B and C, Δ is a scalar and if 

A+ AB=C 
then A’ + AB’ = C’ 


This is proved by performing the transformation of (21.1) term by 
term. 


(21.1) 


THEOREM (21.B) If A’, B’ and D’ are collinear transforms of A, B 
and D and if 


AB=D (21.2) 
it follows that 
Α΄Β’ — D’ 
This is proved by transforming equation (21.2): 
(ΑΒ) = T-'ABT = T-‘ATT-“BT = T-!DT = A’B’ 
As ΤΤΟ! = I the validity of (21.3) follows. 


THEOREM (21.C) The determinant and the trace of any matrix is 
invariant under a collineatory transformation. 


(21.3) 


By (18.A) and (18.3) the determinant and the trace of matrix 
products remain unchanged if the order of the factors is commuted. 
Hence, if B and T are arbitrary matrices, 


det T 0, det B’ = det [T-(BT)] = det [(BT)T-"] 
= det (BTT-*) = det B 
Also 
tr B’ = tr [T-1(BT)] = tr ((BT)T-] = tr (BTT-") = tr B 
THEOREM (21.D) The property of a matrix being Hermitean is 
preserved under any congruent transformation. 
Let A be Hermitean, so that Αἴ = A. Its congruous transform is 
A’ = TIAT. By (16.7) and the associative law of multiplication 
ΑἽ = ΤΙΑΙ͂Τ = TIAT = A’ 
Hence A’ is Hermitean. 


THEOREM (21.E) The property of a matrix being Hermitean or 
unitary is preserved under unitary transformations. 


‘ 
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For Hermitean matrices this follows from (21.D) since unitary 
transformations are special instances of congruent transformations. 

The unitary transform of a unitary matrix is, by (19.6) or (19.4), 
a product of three unitary matrices. This is unitary in consequence of 
theorem (20.F). 


THEOREM (21.F) The property of a matrix being complex-orthogonal 
is preserved under collineatory transformations with a complex- 
orthogonal matrix. 


By definition the reciprocal of a complex-orthogonal matrix is 
complex-orthogonal. By (19.4) the above transform is a product of 
three complex-orthogonal matrices which is complex-orthogonal by 
theorem (20.G). 


THEOREM (21.G) The scalar product of two vectors is invariant 
under any unitary transformation of the vectors. 


Let 4 and b be an arbitrary row and column vector and (ab) their 
scalar product. The transformation is performed by b= Ub’, 


ἃ =< a’U, where U is unitary. Then 
(ab) = ((a"U)(Ub’)) = (@’(UtUb’)) = (@’b’) 


THEOREM (21.H) Let 4 and b be arbitrary row and column vectors 
and C a complex-orthogonal matrix. Then the scalar product 
(ab) is invariant under a transformation of the vectors by the 
matrix Ὁ. 


The transformation is performed by 4° = a’°Ct 
b = Ch’ 
(a°b) = ((a"’Ct)(Cb’)) = (€°"(ECh’)) = (@"(C-1Cb’)) = (8) 


Real-orthogonal matrices are special instances of unitary and of 
complex-orthogonal matrices. A real-orthogonal transformation of 
space vectors corresponds to a rotation of the axes of a Cartesian 
frame of reference about the origin in such a way that the axes 
remain orthogonal. By (21.D) and (21.E) the scalar products of 
vectors are invariant under this transformation. This is a generaliza- 
tion of familiar theory: the length of space vectors and the angles 
between different space vectors are not affected by rotation of the 
coordinate axes. 

Usually physical significance is assigned to the length and scalar 
products of vectors but not to the values of their components. The 
invariance of scalar products of abstract vectors under unitary trans- 
formations suggests that important properties of abstract vectors are 
unitary invariants. One would expect that matrices have important 
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properties in common with their unitary transforms. This is indeed 
so and will be shown in detail in the following chapter. 

If A and B are matrices and AB = BA the matrices commute. 
This property is preserved under collineatory transformations on 
account of (21.B). Again if C and D are matrices such that CD = 
— DC, they anticommute and this property is also invariant. 


EXERCISES 


1. Repeat the deductions leading to the theorems (21.D), (21.E), 


(21.F), (21.G) and (21.H), in terms of matrix elements and vector 
components. 


2. Obtain the unitary transform of 


ψ. 
ἌΞΕΙ 2 1 2 
01 2 


using the transformation matrix 


i/f2 8 ; 
U=|i/v2 -+ - 
θυ AY#/2 61/72 
Show that the trace of the transformed matrix is equal to the trace 
of A. 


3. Prove that the row vectors of any unitary matrix are normalized 
and mutually orthogonal. 

4. Show that any unitary matrix remains unitary if it is multiplied 
by a ‘phase factor’, i.e. a complex number of modulus unity. 


5. Identify the Hermitean, unitary or complex-orthogonal property 
of the following matrices: 


i ees ἀν το RE Sle 
i di | aio 1 “4 
5.1 rch πὰ $i 0 
ΠΣ ν᾽ i/v/2 
iv 2/6 0 
ΠΥ i//6 —i/+/2 


| 
.| 


CHAPTER 7 


DIAGONALIZATION OF Matrices 


22. The characteristic equation 


The subject of this chapter is at first presented in an entirely formal 
way; its significance is considered in Section 25. 

A matrix and a vector may be adapted to each other in such a way 
that the matrix-vector multiplication has the same effect as multi- 
plying the vector by some scalar, so that 

Ax = ox (22.1) 


Attempts at identifying the vectors by which equation (22.1) is 
solved lead to important relations in matrix algebra. af 

In attempting to solve equation (22.1) it is seen that this 1s a 
homogeneous system and has as one solution x = 0. This however 
does not convey any information on the matrix A. Non-vanishing 
solutions exist only for selected values of « which are determined by 
the ‘characteristic equation of the matrix A’, given in equation 

det (A — al) = 0 (22.2) 

The solutions of this equation are called the eigenvalues of A. 
Equation (22.2) is an algebraic equation of the nth degree in « and 
has, accordingly, n solutions. They may be complex, even if the 
matrix elements are real; some of them may be “degenerate, 1.6. 
multiple roots of (22.2). 

Usually the characteristic equation cannot be solved in terms of a 
closed expression involving the matrix elements. For the computation 
of the eigenvalues numerical methods have to be employed; approx!- 
mate solutions can frequently be obtained in algebraic form. This 

ill be considered in Section 27. 
Σ T be a matrix which has a reciprocal, so that A’ = T~'AT is 
a collinear transform of A. Its characteristic equation is 


det (THAT — al) =0 (22.3) 
45 
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By (18.A) and the definition of reciprocal matrices equation (22.3) 
can be rearranged: 


det (TAT — aT-IT) = (det ΤΟΊ ἀεί (A — al)}(det T) 
= det (A — al) = 0 
It is seen that the characteristic equation is invariant with respect to 


collineatory transformations of the matrix. An important conclusion 
follows. 


\ 
\, 


THEOREM (22.A) The eigenvalues of any matrix are collineatory 
invariants, 


If an eigenvalue «, . . . x, is substituted into equation (22.1) it has 
non-vanishing solutions. The solutions are vectors, denoted by 
X.,...X., and are called eigenvectors of the matrix A. 

In accordance with the arguments in Section 7 the eigenvectors are 
not uniquely determined; they involve at least an undetermined 
factor. This factor can, but need not, be fixed by normalization so 
that | x.; [5 = 1. Equally, different eigenvectors of a matrix can, but 
need not, be linearly independent. 

It will now be shown that any linear dependence of eigenvectors is 
possible only if the corresponding eigenvalues are equal to each 
other. 

This is readily demonstrated for two eigenvectors x., and Kea. ἢ 
they are linearly dependent so that 


ΟΧ. + CetiXe41 = 0 (c,, Cet τέ 0) (22.4) 
then, by (22.1) and (22.4), 
Ac,X., = %4C.X-, 
AC 541X541 = Sst1Cst1Xs41 = — %e41CsK.g 
and, by adding these equations, 
(α, — ρος = 0 
This is possible only if «, = ας 1. 

Assuming that the equality of the eigenvalues has been established 
for r linearly dependent eigenvectors, the equality is now demon- 
strated for r +- 1 vectors. Thus let x... X.s4, be eigenvectors, let 
Οἷς ΞΞΞ Oy, —... = Ostet, and 

s+r 
ΟΧ ΞΕ Ὸ (ὦ, ΞΕ 0) (22.5) 


j=s 


Then, by (22.1) and (22.4), 


Acs 4,X.¢4¢ ΞΞ Ogi γάρ Bopp 


s+r—1 8:1 
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and, by adding these equations, 
(α,.» — Os)Csir7Xspe = 0 
This is possible only if ας», = α,. | +e 
It follows then that the eigenvalues corresponding to any number 
of linearly dependent eigenvectors must be equal. 


THEOREM (22.B) All eigenvectors of a matrix are linearly independent 
if they all differ from each other. However, if a set of eigenvectors 
corresponds to one and the same eigenvalue these vectors may or 
may not be linearly independent. 


23. Collineatory and unitary transformations 


i 2.1)-which determines the eigenvalues and eigenvectors 
Seeman he be regarded as a component of a matrix equation 
AX = ΧΑ’ (23.1) 
X is an unknown matrix and A’ an unknown diagonal matrix. 
Thadagoia elements of A’ are the eigenvalues of A, the columns 
of X are its eigenvectors. Provided that x has a reciprocal (23.1) can 
be written as a collineatory transformation 
XAX = A’ (23.2) 
sform A’ is a diagonal matrix. ν 
© GLO ae trace and sions of matrices are collineatory 
invariants. By (23.2) they are respectively the sum and product of the 
eigenvalues of A. Therefore these two functions of the eigenvalues 
can be derived from the matrix A in its original form without per- 
forming the transformation, provided the existence of a transforma- 
ΙΧ 1 blished. : 
"κὰν (158) and (17) the matrix X has a reciprocal if the eigen- 
vectors are linearly independent. By (22.B) this condition is complied 
with if the characteristic equation has no multiple solution. This is 
sufficient for the existence of a collineatory transformation to the 


diagonal form. 
At this stage a simple example is instructive. Let A be a 2 x 2 


matrix 
A= ~ δ (23.3) 
Ay, a 
Its characteristic equation has the form 
α — a(dy, + doe) + (αιχᾶς5 — 41242) = 9 


The coefficients of this equation are readily identified as the 
negative trace and the determinant of A. The eigenvalues are 


α = ξίἘάμ + ἀρ ΞῈ [(Gir — Gog)® + 4αγ,α;}}" (23.4) 
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In terms of normalized eigenvectors the transformation matrix can 
be written as 


24}2/N, 2432/Ne 


X= | Node. — ay Ny" {29 — ay ™ (23.5) 
— [4x2 — ay)? - [Gag — ἀμ)" 
“+ 4Gy24o)]"""} + 44 949))"/*} 
where 

N? : 
wel =4 | aye |? + | Gre — ay) F [Gan — ἀμ)" + 40, 24q))"/? ; 

The two eigenvectors are linearly independent unless 

(422 — Gy)? + 44,24, = 0. 
In this case the matrix X has no reciprocal and A cannot be di- 
agonalized by any collineatory transformation. 

In physics matrices of importance are usually Hermitean or uni- 
tary. For matrices of this kind the existence of a transformation 
matrix can be established even if the characteristic equation has 
multiple solutions. Moreover the transformation matrix is unitary or 
may be chosen as unitary. A direct proof of this would require the 
demonstration of linear independence and orthogonality of the 
eigenvectors. Here an alternative argument is put forward: it will 
be shown that a unitary transformation matrix can be constructed 
by a finite set of well-defined algebraic operations. 

A Hermitean or a unitary matrix A is to be transformed to 


A’= U"'AU (23.6) 
where U is unitary and A’ is diagonal. The transformation matrix is 
expressed as a product of unitary matrices 

VU = VOUMVUO@UM ,., Ue-» (23.7) 
The matrix U® is formed from two sub-matrices, a (j — 1)-dimen- 


sional unit matrix and an (m + 1 —j)-dimensional unitary sub- 
matrix to be denoted by V“; V@) = U®. The matrix elements of 


U”) which are outside the two sub-matrices are assumed to be zero. 
i 0 


Ue) — P : ye (23.8) 


It is readily verified that the matrix (23.8) is unitary for every j. 
The following transforms of A are now defined 
A® = υὐ-π-ιαύ- ἡ 9 (;-- 1. m— 
ἀν Ἂς “Mean ἂν Ν (n — 1)) (23.9) 
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It is attempted to determine these transforms such that they con- 
sist of two sub-matrices, all elements outside these being zero. One 
sub-matrix is taken to be a diagonal matrix Ὠ0) of j dimensions, the 
other has n — j dimensions and is denoted by B®). 


D” 0 
AY) — (23.10) 
0 BY 


By theorem (21.E) the matrices A®) and all their sub-matrices are 
Hermitean or unitary if A is Hermitean or unitary respectively. Also 
by (22.3) the characteristic equation of all the transforms of A are 
identical so that the diagonal elements of D® and the eigenvalues 
of B®) are eigenvalues of A. 

The transformation (23.9) leaves the sub-matrix DO-” unchanged 
and transforms the sub-matrix B?-» according to 


BY!) — VO-1B¢-DVe) (23.11) 


In order to comply with (23.10) the effect of the transformation 
(23.11) must be a substitution of zero for the non-diagonal elements 
in the jth rows and columns of B’-. It is, in fact, sufficient to show 
that this substitution is made in the columns; the corresponding 
substitution in the rows then follows, for Hermitean matrices on 
account of their definition, for unitary matrices by theorem (20.A). 

In order to prove that an appropriate transformation matrix exists 
it is sufficient to exhibit an appropriate matrix ὦ (Ὁ); construction of 
the subsequent transformation matrices does not involve any 
additional problems. 

Let αι, be an eigenvalue of A; the corresponding normalized 
eigenvector—which need not be defined uniquely—is then the first 
column of (Ὁ, The first column of A® is given by 


a (1)-- a) ()πι κα, — 
ay = > ~ Une = Vemma = > Une Uy = On 
kom k 


in accordance with (23.8). In order to complete the proof it is 
necessary to demonstrate the existence of a unitary matrix U©). The 
precise values of the matrix elements in the 2nd to nth columns are 
not relevant. These column vectors can accordingly be deduced from 
any set of π — 1 vectors which, together with u‘)’ are a set of linearly 
independent vectors. These n — 1 vectors may be, for instance, of the 
form (2.3). The set of linearly independent vectors is converted to a 
set of orthogonal normalized vectors by the use of equations (3.5). 
Thus the matrix U) is shown to exist. 
As a conclusion two theorems can be formulated. 


THEOREM (23.A) Hermitean and unitary matrices can be diagonalized 
by a unitary transformation. 
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THEOREM (23.B) The eigenvalues of Hermitean matrices are real; the 


eigenvalues of unitary matrices have a modulus equal to unity. 


This follows from the definitions and from theorem (21.E). It is not 


difficult to show that real symmetric matrices can be diagonalized by 
real-orthogonal transformation matrices. 

It follows from the unitary invariance of the trace and determinant 
of matrices that the trace is the sum of the eigenvalues and the 
determinant their product. As an example of a Hermitean matrix 
consider (23.3), assuming that a, and dy» are real and ay, = jo. Then 
(αι; — G2)" + 490; is necessarily positive so that the eigenvalues 
(23.4) must be real. The scalar product of the eigenvectors is, accord- 
ing to (23.5) 


1 
(X..X.2) = NAN, A ["-Ἐ(α,.-αμ)"-ἰ[(α,.--αι.)"- 4 | ἄχ, [1 Ξε 0 


so that the matrix X is proved unitary. 
A simple example for a unitary matrix is 


c ele (1 — c2)¥2 
ς -- 
(1 —c?)”2 —¢e-# 


where c and ¢ are real and 0 < ὁ < 1. The eigenvalues are 
y = -Ὸ — c* sin? 4)? + icsin d 


Their modulus square is equal to unity. The transformation matrix 
is 


2(1 — c*)¥2/N, Al — c*)¥4/N, 


li —2N;‘[c cos ¢ 2Nz‘[c cos d 
Ἔ (1 — c* sin? d)] + (1 — c* sin? )] 
It can be verified that the eigenvectors are orthogonal. 
THEOREM (23.C) If A and B are Hermitean matrices with non- 


negative eigenvalues the eigenvalues of A + B must also be non- 
negative. 


Let U be a unitary matrix and A’ = U-!AU be diagonal. Then 
the diagonal elements of A are 


Qj; = ) πμαρμ = PD ay, [13 
Ἢ τ τΝ Ὡς | ἐκ 
i k 


and they are according to premises non-negative. By a similar 
argument it follows that the diagonal elements of all unitary trans- 
forms of A are non-negative. 

The sum A + Bis Hermitean. The diagonal elements of A and of 


ὶ 
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B are accordingly non-negative and if V is unitary the diagonal 
elements of V-1AV +- V-'BV = V-(A -+- B)V are also non- 
negative. Hence, if V is the matrix diagonalizing A + B, the result- 
ing diagonal elements are the eigenvalues and must be non-negative. 


24. Congruent transformations 


Hermitean matrices can be diagonalized by congruent trans- 
formations which are not unitary. These transformations are not so 
important as those considered in the preceding section but are useful 
in various contexts. 

Let A be a Hermitean matrix, T a transformation matrix and 
A’ = TIAT be a diagonal matrix. Let T be a product of the form 
(23.7) where T and Τῷ" are substituted for U and Ut) respectively. 
The matrices ΤῸ) have the form (23.8) but V is not unitary. Trans- 
forms of A are defined according to (23.9) with ΤῸ) and ΤΩΝ sub- 
stituted for U® and U1; these transforms are Hermitean. It 
is attempted to obtain them in the form (23.10), but the diagonal 
elements of D are not derived from the characteristic equation 
and not related in any simple way to the eigenvalues of A. The sub- 
matrices B®) are transformed according to (23.11) with VW substi- 
tuted for V“)-1. 

So far the procedure is completely analogous to the corresponding 
argument in Section 23; however, the matrices are not derived from 
eigenvectors. Neither the transformation nor the resulting matrix A 
are uniquely defined. 

The matrices Τῷ and V) can be, but need not be, defined accord- 
ing to 

Ξε 1; Be = —ay/ay; ( = dmx (mm > 1) 

vf = 1; off = —bY-P/bY-P; Bik = δια (σι > jf) 
These definitions are applicable only if a,, and the δή" do not 
vanish. Alternative transformation matrices are defined after per- 
forming a preliminary congruent transformation by which matrix 
elements are permuted or linearly combined (see Section 19, example 
2). 

The following transformations are of greater importance. Let δ 
and B be Hermitean matrices and let B comply with the condition 
that all its eigenvalues δὲ; are positive. It follows then that the matrix 
B'/? is also Hermitean and that both B and B’? have a Hermitean 
reciprocal. Thus the two matrices are transformed by a common 
congruent transformation. 


A’ — B-!/2AB-1/2 B’ = B-!/2BB-1/2 = I 
The matrix A’ is Hermitean, by theorem (20.D); the matrix B’ is 
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diagonal and invariant under all unitary transformations. In the next 
Stage A’ is diagonalized by a unitary transformation: 

A” = W-A’W = WIA'’w 
Whereas Β΄ remains unchanged under this transformation. Thus 
both matrices have been diagonalized by a congruent transformation. 


The transformation matrix is T = B-1/2W. It is not necessary and it~ 


would be cumbersome to derive B-/? or B’ explicitly. Instead it is 
noted that the collineatory transformation 


B-1/2A’Bl2 — Β-ΊΑ 
converts A‘ to B-1A. By (22.A) the eigenvalues of A’ are equal to the 


eigenvalues of B-*A. One can accordingly start with the equation 
determining the eigenvectors of B-1A: 


B-1Ax = yx 
or Ax = yBx 

The eigenvalues of B-1A are accordi 
= enna ordingly determined by the 

, det (A — yB)=0 (24.1) 

_ itis a more involved structure than the characteristic equations of 

single matrices since every term of the determinant depends on the 
unknown γ. 

In this problem the eigenvectors x are not connected with the 
transformation matrix T. In order to find the latter it may be con- 
venient to remember that (A’)? = B-1A2B can be calculated by 
matrix multiplication, that eigenvectors of A’ are identical with 
eigenvectors of (A’)? and that the eigenvalues of (A’)? are the 
squares of the quantities y as obtained from (24.1). 

This result is applicable to Hermitean (or real quadratic) forms. 
If A is Hermitean O(x, x) = X°Ax, and x = Tx’, then O = §"A'x’ 
and, if A’ is a diagonal matrix, then Q is converted to a sum of 
moduli squares, or squares, respectively. 

As this congruent transformation can be performed in different 
ways it 1s possible to select particular transformations by subsidiary 
conditions. In particular one can diagonalize two matrices simul- 
taneously so that two Hermitean or quadratic forms can be simul- 
taneously converted to sums of moduli squares, or squares, 

In conclusion it may be of interest to point out a remarkable 
property of Hermitean forms. Regarding Q(x, x) as a function of 
the vectors X and x, let these vectors be varied, subject to the con- 
dition that | x |* remains fixed. The corresponding values of Q will 
᾿ Include maximum and minimum and saddle points. They can be 

identified by setting the differential of O equal to zero. 

(0/éx)(OQ — «|x |?) = ΧΑ — ot —0 
(2/@x"Q — «| x Ὁ = Ax — ax =0 a? 


DIAGONALIZATION OF MATRICES 53 


Here « is a Lagrange multiplier, introduced in order to maintain the 
fixed value of | x |*. The unfamiliar differentiation with respect to a 
vector is performed by differentiation with respect to every vector 
component, the resulting differential coefficients being components 
of another vector. The result, that is equation (24.2), is the same as 
equation (22.1) by which eigenvectors are determined. Hence the 
following theorem is established. 


THEOREM (24.A) The maximum and minimum and saddle points of 
a Hermitean form X°Ax are found by substituting a normalized 
eigenvector for x; the corresponding values of the Hermitean form 
are the eigenvalues of A. 


25. Discussion 


After presenting a number of theorems and rules concerning the 
transformation of matrices to the diagonal form, it is appropriate 
to consider their significance and the use that can be made of them. 

It may be noticed that, by diagonalization, the specification of 
matrices is rendered more concise. The number of non-vanishing 
elements in a diagonal matrix is n, as distinct from the usual n’. 
Moreover the eigenvalues of a matrix are common to a whole set of 
matrices which are mutually related by collineatory transformations. 
Also, the knowledge of an eigenvector allows one in special instances 
(cf. (22.1)) to replace a matrix by a single number. 

The congruent diagonalization of matrices is frequently helpful 
because entities in physics and in statistics are often expressible as 
quadratic or Hermitean forms. They can be transformed to a sum of 
squares of a single variable only. Frequently the possibility of hand- 
ling these quadratic forms mathematically depends on the separation 
of variables by diagonalization. 

The multiplication of any pair of diagonal matrices is commutative. 
By theorem (21.B) this property of the matrices is preserved if both 
are transformed by one and the same collineatory transformation. 
As collineatory transformations can be inverted it follows that all 
pairs of commuting matrices can be diagonalized simultaneously. 
Thus all powers of a matrix can be diagonalized simultaneously and 
any matrix polynomial can be diagonalized term by term, implying 
one and the same transformation matrix. 

The same is true for infinite power series as defined in Section 16. 
For these series the criterion of convergence is most readily formu- 
lated if they are diagonalized: every diagonal element consists of 
a power series in a single variable; the coefficients are the same in 
all elements and the variable assumes all eigenvalues in succession. 
The convergence of these series can then be assessed by any accepted 
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criterion. Convergence of the matrix series means that convergence) 


in all diagonal elements is established. 

Moreover the method of diagonalization can be used for defining 
irrational or transcendental functions of matrices without any 
appeal to expansions in terms of matrix powers. If J(x) is any 
analytic function and the eigenvalues («,, . . . ας) of the matrix A are 
successively substituted for x, then the diagonal matrix with the 
elements /(«,;) defines the diagonalized matrix function J(A). It can 
subsequently be converted to different forms by collineatory trans- 
formations. 

These mathematical aspects of diagonalization are not sufficient to 
explain its significance in physics; it will be seen that the eigenvalues 
of matrices are frequently quantities playing a leading role in physi- 
cal theories and in their experimental implications, such as the natural 
frequencies of oscillations. In the following section a special type of 
matrix is surveyed; even here the concepts of diagonalization will 
prove indispensable. 


26. Projective matrices 
The ‘projection’ of a space vector r on another vector s is defined by 
t = s(r/s) cos B (26.1) 
B being the angle between the two vectors. The magnitude of the 
projection, 
=rcos β (26.2) 


is the scalar product of r with a unit vector in the direction of s: the 
vector t is a multiple of s. 


The concept of projection can be applied to abstract vectors as is 
shown in this section. 
Let A’ be a matrix and A its unitary transform, then 


oak --ΙἹἸ γ᾽ 
ang = > » Unt Qypng 
l m 
If A’ is a diagonal matrix, Bin = % Or, and therefore 


aj = ne} Uy, % Uy 
1 


This can be written as a matrix equation 


ee > 2, P(/) (26.3) 
t 


where Prd = μι; (26.4) 


By (26.3) every matrix that can be diagonalized by a unitary trans- 
formation admits an expansion progressing in eigenvalues. The 
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expansion coefficients are matrices which can be expressed in terms 
of eigenvectors. Their properties are readily recognized by squaring 
equation (26.4), 


Pir = Σ Prf(OPrLD = ey Ui Ugly Mr 
j j 


= Upliz = Prifl) 

or P2(/) = P(/) (26.5) 
All positive powers of P(/) are, accordingly, equal to each other. 

The matrices P(/) are, by (26.4), Hermitean and can accordingly 
be diagonalized by unitary transformation. Equation (26.5) must be 
valid before and after diagonalization. It follows that the eigenvalues 
of P(/) must be 0 or 1, since only these two numbers are equal to 
their squares. The trace 


tr PC) = > pall) = >, Uittin = | 
h h 


Thus there must be a single eigenvalue equal to unity, the others 
being zero. Cua 
Further properties of the matrices P(/) are derived by considering 
the product 
y(/) = Ρ(ἢχ 
where x is an arbitrary vector. This product is evaluated by 


y(D = Σ UjjUiyX?, 
k 

and can be written as 

y(/) = u;(u;-x) (26.6) 
This equation is similar to (26.1). The matrix P(/) converts the vector 
x into a vector proportional to u which is a unit vector ; the magnitude 
of y is equal to the scalar product of the two vectors. y can be 
regarded as the projection of x on u. For this reason P(/) is called 
a projective matrix. It can be shown that the matrix product 
P(/)P(m) vanishes. ΠΥ 

Matrices with eigenvalues 0 and 1 are generally called projective 

matrices. 


27. The practice of diagonalization 


Algebraic expressions of the eigenvalues, such as (23.4), can be 
obtained in exceptional instances only. However, if the matrix 
elements are given numerically the eigenvalues can be derived 
numerically to any degree of accuracy. On the other hand there are 
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algebraic methods for approximate diagonalization. Problems of 
diagonalization can usually be solved by routine methods but may 
Sometimes be a problem for research. 

Numerical diagonalization is an extensive subject and could not 
be dealt with in a volume like the present. However, numerical and 
approximate algebraical methods have wider implications and some 
aspects of these methods are accordingly reviewed in the present 
section. 

All numerical procedures for solving algebraic equations of 
higher degrees are applicable to the characteristic equation. Also, 
there are methods specially adapted to matrices. As an example 
consider a technique which is particularly well suited to Hermitean 
matrices. If the highest eigenvalue is markedly larger than the others 
the multiplication of a unit vector by a high matrix power will 
convert it to a multiple of the eigenvector of the highest eigenvalue. 
It is accordingly convenient to begin with some arbitrary unit vector. 
This is multiplied by the matrix and the resultant vector is renor- 
malized. After a sufficient number of repetitions, multiplication by 
the matrix will multiply the vector by a constant factor, which is 
equal to the highest eigenvalue. The corresponding eigenvector is 
then readily identified and can be used for constructing the projec- 
tive matrix as defined by (26.4). By using equation (26.3) it is now 
possible to construct a matrix which has the same eigenvalues as the 
original matrix except the highest. If the second highest eigenvalue 
is larger than the others the method of computation can be repeated. 

Numerical or semi-algebraical methods for calculating eigen- 
values can be based on theorem (24.A). By putting into a Hermitean 
form any unit vector the form takes a value higher than the lowest 
and lower than the highest eigenvalue. By systematically changing 
the components of the vector in such a way that the value of the form 
is lowered it is relatively simple to find an approximation to the 
lowest eigenvalue. The approximation may be good even if the 
vector itself is not a good approximation to the eigenvector. 

It is sometimes possible to reduce the characteristic equation so 
that it is replaced by a number of equations of lower degree. At first 
the matrix is transformed in such a way that it divides into sub- 
matrices that extend equally to each side of the leading diagonal; 
the sub-matrices have no elements in common and all elements 
outside the sub-matrices are 0. At a later stage the sub-matrices are 
diagonalized independently of each other. The first transform is to be 
identified prior to solving the characteristic equation. This is possible 
if matrix transformations are linked to transformations of quantities 
describing physical objects. If an object is, for instance, specified in 
terms of spatial coordinates and the specification is symmetric with 
respect to rotations, the appropriate matrix transform can be con- 
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structed by means of the theory of group representation or an equi- 
valent theory. These mathematical theories are not considered in 
this book but their significance will be illustrated by examples 
(cf. Sections 30 and 33). 

If the number of rows and columns is very large and the difference 
between successive eigenvalues is small the deduction of eigenvalues 
may well become impracticable. On the other hand, it may be suffi- 
cient to derive the distribution of eigenvalues, that is the number of 
solutions of the characteristic equation (corresponding to eigen- 
values) between any two numerical limits. Distributions of this kind 
can be derived without solving the characteristic equation. 

If a matrix is almost of diagonal form its diagonalization can be 
performed in algebraical terms by a method of successive approxi- 
mation known as ‘perturbation theory’. Let a Hermitean or unitary 
matrix have the form 


ς--Α- βΒ (27.1) 
where Ais a diagonal matrix and β is a small parameter. It is attempted 
to obtain the diagonalized matrix C’ and the unitary transformation 
matrix U in a series of ascending powers of β 

C’—=A+/D+ fPE+... (27.2) 
and U=1+ pV+ PW +... (27.3) 
It is further assumed that all eigenvalues «, of A are distinct from 


each other. . 
Expanding the equation 


CU = UC’ 
in the form 


A +- β(Β- AV) + BAW + BV) +... 
fy ΤῊ avn ἡ δ. + B(WA+VD+E)+... (174) 


the factors of every power of f are set equal to each other. As the 
first terms on either side cancel the equation to be solved first is 


B+ AV—VA—D=0 (27.5) 


The diagonal element of this equation is 
δ, + = (αμῦμ — VjxAx3) — yj = 
k 


All terms in the sums vanish except a,,;v,; — v,,4;; which cancel; 
hence 
d,;; = b;; (27.6) 
The first order correction of the eigenvalues is accordingly equal to 
the diagonal elements of /B. 
c 
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The non-diagonal elements of (27.5) are related by 
by. + > (GimVmk — VjmOmx) — ἀμ = 0 
The last term vanishes and the sums reduce to two terms, 
AjjVix — VppArr 


͵ 


so that 
() #k) (27.7) 


The elements v,;; remain undetermined; however, as | +- BV is to be 
unitary it is necessary that 
( Ὁ βνλ! + BV)" = 1+ ACV + ν᾽) + O(6?) = 1+ O(6*) 
Therefore the real part of v;; must vanish. It is convenient to set 
reflec equation to be solved is 
AW + BV —-WA— VD—E=0 (27.8) 
Its diagonal element is 


| > (AimWinj + DimP ms — Wim4ms — VimAms) — ἐμ = 0 


Again most terms in the sums which involve the elements of the 
matrix A vanish; the sum involving D vanishes completely; thus 


αν — W5yQyy + > DimUmy — ἐμ = 0 
or, by (27.7), ‘u EN 
ἐμ = = >) ον (27.9) 


Amm — 45; 
If ἂρ is the lowest eigenvalue of A and if B is Hermitean then δ᾽; 
must be negative. 

Higher order corrections to the eigenvalues and eigenvectors can 
be derived by continuing this procedure but the expressions tend to 
become unwieldy. 

If some of the eigenvalues are equal the method is not applicable 
in its present form as the denominators in (27.7) would vanish. It can 
nevertheless be applied to an eigenvalue which is different from the 
others. If a set of 2,3. . . s eigenvalues are equal to each other one 
can start by diagonalizing the 2, 3 . . . s dimensional sub-matrix in 
_ which these eigenvalues are contained. Subsequently the present 
method is applicable. 

With these hints on practical diagonalization we conclude the 
presentation of matrix algebra and turn to its most important 
applications. 
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EXERCISES 


1. Diagonalize the matrices (10.1). , 
2. Show that the matrix A can be diagonalized by the unitary 
transformation matrix U, where 


pty el i//2. 1/3 1/6 
a-| 7, § a υ-Ἰ νὰ ΠΩ 1/6 
πξτωπεδυ LAS 0 1//3 —2/+/6 


3. The matrix 
1: 0-20 0 
“ΟῚ 0 
ον μὰ. 0 
0.00 -| 


is Hermitean and unitary. Discuss the consequences, 


CHAPTER 8 | 


OscILLATIONS 


28. Simultaneous differential equations 


Matrices are useful for solving systems of linear differential equa- 
tions. Let x;, x)... X, be unknown functions of a single independent 
variable (1) and connected by simultaneous differential equations 
with constant coefficients: 


ἄχ, /dt = ay Xy + AypX_ +... + AynXn 
dx,/dt =e AX, + AooXo + eee or donXn 


(28.1) 


dx,,/dt = AgyX + AngXe t+... + ΞΕ 
They can be written in terms of a vector and a matrix as 
dx/dt = Ax (28.2) 


In solving this equation let V be a matrix independent of ¢ and y 
another vector determined by 


x = Vy (28.3) 
Substitution of (28.3) into (28.2) gives 
Vidy/dt) = AVy (28.4) 


It is now assumed that A can be diagonalized by a collineatory 
transformation so that 


A’ = V"'AV 
is a diagonal matrix. The transformed equation is resolved into 
components by 


dy,/dt = Vi 
(28.5) 


dyq/dt = αν νη, 
60 
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These equations are readily integrated: 


Vi = C, exp (%f) 

Vo = Cy exp (Ht) (28.6) 

Va = Ca exp (α, 1) 
where C,...C, are arbitrary constants and a, ... ἀρ are the cigen- 
values of A. The corresponding eigenvectors are the columns of V. 
If the matrix A is Hermitean the components of y are real exponen- 
tial functions of ¢. 

It is convenient to identify first the particular solutions of (28.1) 
in which all x; are exponential functions of ¢. For this purpose one 
of the C; is taken to be unity whereas the others are 0. In this way ἢ 
different vectors are defined which have a single non-vanishing com- 
ponent varying exponentially with ¢. The transformation (28.3) yields: 

x = ὃ) CXP (a,t ) (28.7) 

The general solutions of (28.1) are linear combinations of solutions 
of the type (28.7) with arbitrary coefficients. 

Systems of simultaneous second order equations with constant 
coefficients can be solved in a similar way. The equation 


d?x/dt? = Ax (28.8) 
is transformed to 

d°y/dz* = A’y 
where A’= V"AV 


is a diagonal matrix; V and y are defined as before. Thus the original 
equations are transformed to | 


d*y,/dz? = HV) 


d2y,/dt? = on)n 
Integration results in 
Vi = Cy exp (ait) + δὲ exp (--αὐ ἢ) 


(28.9) 


Yn = Cy exp (αν 1) + Dy exp (—o%*t) 
where C,...C, and D,... D, are arbitrary constants. If the eigen- 
values of A are real and negative the solutions are periodic functions 
of ¢. It is again possible to identify those particular solutions in 
which all x; depend exponentially upon ¢ and the general solution 
is a linear combination of these particular solutions. 
Application of matrices in solving simultaneous differential equa- 
tions is made in theories of oscillations. 
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29. Oscillations of three particles in one dimension 
Consider at first two particles of mass m interacting with conservative 


forces and constrained to move in a single dimension only. In the 
absence of any additional forces their equations of motion are 


md*(x, - x,)/dt? = 0 
md?(x, — X_)/dt? = F(x, — x;) 


where x, and x, are the coordinates of the particles, ¢ is the time and 
F is the force depending on the distance between the particles. Ac- 
cording to the first equation of (29.1) the centre of mass must stay 
at rest or move at constant speed. It is assumed that the forces of 
interaction tend to keep the particles at fixed distances apart (de- 
noted by a) or to bring them back to these distances if they have been 
displaced : 


F(x, — X2) = —(% — ας — a)k + O(x, — x2, — a)? 


where k is a positive constant. Denoting x, — x. — a by x and 
neglecting the last term the second equation of (29.1) becomes 
d*x k 
dr? m 
According to this equation the particles perform simple harmonic 
motion with frequency ὦ = (k/m)*2. 

Consider now three particles performing an oscillatory movement 
about some rectilinear configuration of equilibrium. It is conveni- 
ent to specify their positions in terms of their displacements x,, x, 
and x3 from their positions of equilibrium. The forces of interaction 
are to act between nearest neighbours only and to be equal to 


ἢ = —k(Q% — %2), Fy = —k(xq — xq) 
F, = —F, —F, 


(k being a positive constant) so that the interaction between particles 
| and 2 is the same as the interaction between particles 3 and 2. 

It is further assumed that the masses of the particles 1 and 3 are 
equal to each other and differ from the mass of the particle 2. The 
masses are denoted by m and M respectively. The equations of 
motion are 


(29.1) 


md*x,/dt? - k(—x, =} Xe) 

Md?x,/dt? = k(+x, — 2x. + x,) (29.2) 
md®x3/dt® = κίας — xz) 

and can be solved by the method described in the preceding section. 
First, new coordinates are defined as 


δὶ = ΜΠ Ἔχ, δ, τ-ῷὸ Mx, 5ς = m2x, (29.3) 
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to be regarded as the components of a vector s. The equations of 
ti e transformed to 
aig Bp d*s/dt? = As (29.4) 


where 


—l/m  (1/mM)** 0 
A= ἴ /mM)¥* —2/M_ (1 itn 
0 (ΤΠ ΜΞ = —1/m 
is a real symmetric matrix, so that it can be diagonalized by a unitary 
transformation. As a first step the characteristic equation 


—(k/m)—« k/(Mm)? 0 | 
k/(mM)*? —2(k/M)—a« k/(Mm)*? |=0 
0 k/(Mm)¥2 -((μηὴ — α. 
is expanded: . 


pin Ui f 2,1 | a 
The solutions are 


k ἘΠ ἢ 
a =0; a= -- ae aad ΠΊΕ x) 
The eigenvectors are derived by solving three sets of three simul- 
taneous equations. 


1 1 1/2 
“- Uy + (<a) Vo, = 0 


1/2 1 \we 
(am) — + (paz) ==? er 
m 
1/2 
ΣΝ 
m 
1 \wv2 
2 | 1/2 1 
(inna) χροὶ (naz). "τ τ τὰ 
m 
1 \12 Ι Ϊ 
1/2 2 Ι 
-- πεν ss (τ ἕω = — M Ἤ ὩΣ 
1/2 1 \W2 2 1 
(Fy) =~ yet (aa) τ (Ftp) ὦ» 
1 \Wv2 δ κων ἡ tad Τὴν 
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It is sufficient to solve two equations in each set of three; the third | 


is redundant but does not involve any inconsistency. By solving the ' μραίλς.. φέρω παμρραθεις σἀμμκρρνννραμεμῶ Cotes Syl aeey aap Ae 


equations the ratios of two of the components to the third com- 
ponent are obtained; the components themselves are derived by 
normalization. If the eigenvectors are normalized and can be shown 
to be mutually orthogonal they are the columns of a unitary matrix V 
which is the transformation matrix of the matrix A. In this way the 
matrix V is constructed: 


ΡΩΝ Ι 7)», 1/Vz 
V.=|(M/mi)¥2 0. - (μι M32 
12)» -()»,) 1/vs 
where 
%=2+(M/m); %=2+4m/M); = 
It is readily verified that the three columns of V are mutually 
orthogonal. 
By means of this transformation matrix equation (29.4) is trans- 
formed to 


d*y,/dt? = αν; 


Vi = Cj Cos (/(—a4)t + 5) 

where C; and ¢; are arbitrary real constants. We define three vec- 
tors by taking the constants 4; = 0 and letting C, be successively 
1, 0, 0; 0, 1, 0; 0, 0, 1. 

As a result we obtain three ‘modes of vibration’ determining the 
synchronous movement of the three particles. They perform simple 
harmonic motion with real or zero frequencies (corresponding to the 
eigenvalues of A being negative or zero). 

So far the modes of vibration are specified in terms of the compo- 
nents of s. By way of equation (29.3) they are converted to equations 
determining the time dependence of the displacements of the par- 
ticles x;. The result is: 

x) — 1/y, = xi) --- xD 

xy" = (1/13) cos [(k/m)¥71] = —x? 

Xe" =O (29.8) 
xy" = (1/¥5) cos {[(2k/M) + (k/my}/2, = χῷ 

x3” = —2(m/M)x}? 

The first mode consists of a translation of the three particles at 
uniform velocity (corresponding to zero frequency). The two other 
modes are oscillations proper. In the second mode the central par- 
ticle stays at rest whereas the two other particles move at equal 
amplitudes and opposite phases. In the third mode the first and 
third particle have equal amplitudes and phases; the second particle 


which is solved by 


— 


— 


—— a 


general solution of the equations of motion is a superposition of 
these modes with arbitrary phases and amplitudes. 

The present theory is readily generalized so that it applies to 
chains of four, five and more particles. The solution of the charac- 
teristic equation is, however, no longer possible by elementary 
methods and the other steps in the mathematical deduction become 
more and more cumbersome. If, however, the number of particles 
is very large the theory can again be simplified. This will be con- 
sidered in the following section. 


30. Linear chains 


Consider a chain of particles oscillating about a configuration of 
equilibrium in which the ‘links’ of the chain have a fixed length. 
Forces acting along the links tend to restore their lengths to the 
value at equilibrium. If the particles are labelled 1 to N the force on 
the jth particle is equal to 
K(xj-1 — 2x; + Xj+1) 

where the positive constant k is the same for all triplets of particles. 

Provided that the number of particles is very large it is plausible 
to assume that the movement of the two particles at the ends of the 
chain will virtually not affect the movement of the remaining N — 2 
particles. The theory can be simplified by letting the particles at the 
ends be subject to a constraint known as a ‘periodic boundary con- 
dition’. The meaning of this constraint is explained by placing an 
additional particle at the left end and an additional particle at the 
right end of the chain. The constraint is of such a kind that the addi- 
tional particle at the left end is to move precisely in phase with the 
particle N; also the additional particle at the right end is to move 
precisely in phase with the particle 1. The periodic boundary con- 
dition is readily taken into account by assuming that the particle N 
interacts with the particle 1 as if they were nearest neighbours. 

With these assumptions the equations of motion of the chain are 


m(d?x,/dt®) = k(xj_4 — 2x; + Xj+1) (30.1) 
where j = 1,2... N(xXy41 = X13 Χο = Xy), and where m is the mass 
of the particles. In matrix and vector notation these equations are 
represented by 


d*x/dt? = Ax (30.2) 
The elements of the matrix A are zero except for 
ay; = —2k/m; Qjyj41) = Agj-) = k/m (30.3) 


The elements a,y and ay; are equal to k/m. 
c* 
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On account of the ‘periodic boundary condition the chain has. 
acquired an important symmetry property. If it moves in sucha way 
that the particle N takes the place of 1, the particle 1 takes the place 
of 2, the particle N — 1 takes the place of particle N with all inter- 
mediate particles moving up by one link, the set of equations (30.1) 
remains unchanged. On account of this it is possible to identify 
the eigenvectors of the matrix A without having any knowledge of 
the eigenvalues. 

The above shift to the right is mathematically specified by a 
matrix T with the elements t;,1) = 1 (including fy, = 1) whereas 
all other elements ¢;, vanish. Thus, if 

x’ = Tx 
then Xj = Xj-1 

It will now be shown that the matrices A and T commute. Let 
S = TA — AT, so that 


Si. = » (ἑαυ; iar Ajyliy) = 155+ 1) 5+-1)k — ἀᾳᾳ--αγίᾳ --ἰγς 
ἰ 


= A 5+1)k τ ἀκι--1) 
Hence 
S(j—1yk = ἄρ — A j—1)R-1) 
On account of (30.3) it is merely necessary to consider those elements 
of S which depend on the elements of A in the leading diagonal or 
in a diagonal adjacent to it. They are 
S(j-1(5-1) = Aj—-1) — Aj-1yy-2) = 1 -1=0 
85-15 = ἀρ. — ἀηῃ- υσ-Ξ — 2 — (—2) =0 
SG—N+) = ἀκ, ει) τ ἀμ πα, = 1—1=0 
Hence all elements of 5 vanish so that 


ΤΑ — AT=0 (30.4) 
The eigenvectors of A are determined by 
Ay = av 
Multiplication by T gives 
TAy = ATy = aTy (30.5) 


so that Tv is also an eigenvector. If a normalized eigenvector is 
multiplied by e where χ is a real number it still remains a normalized 
eigenvector. It follows then from (30.5) that 


Ty = ve (30.6) 


It follows also that the eigenvectors of A are also eigenvectors of T. 
In order that equation (30.6) should be valid it is necessary to prove 
that the eigenvalues of A are non-degenerate or at least, to show 


| 


———————————— 
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that the number of eigenvectors derived from (30.6) is equal to N. 
This will be shown in due course. | 

In order to comply with equation (30.6) and simultaneously with 

(Ty); = Uj41 
the eigenvector v.,, must have the components 
Vin» Vin ΕΧΡ (iZn)y Urn ΟΧΡ (2iZn) - « - Urn exp (Nizn) 

Further, if this vector is to be normalized it is necessary to put 
Vin = Ν- 5, 

The phase constants y are found by remembering that T°, T°... 
T have the effect of moving the chain 2,3... N links respectively 
to the right. But the last operation restores the original position of 
the chain. Therefore it is necessary that 

Hay, TH =1 
χα Must accordingly be an Nth root of unity and 
Νχ, = 2πη, χϑ τ 22n/N 
where 1 may be any integer between 1 and N. In this way N inde- 
pendent eigenvectors of x are identified and on account of this result 
the validity of the procedure is justified. 

Let V be a unitary matrix diagonalizing A and independent of the 
time. The elements of V are accordingly 

Uin = (Νγ- 112 e2ainj/N (30.7) 
The verification of the orthogonality of the columns of V is left to 
the reader as an exercise. It may be pointed out that the identification 
of the eigenvectors is entirely independent of & and the mass of the 
particles. ad ; 

The eigenvalues of A are found by substituting the expressions 
(30.7) into the equation 

AY, = a9 

In terms of components this is 


Lyin = > Ayj0jn = N-Y*(Quq—yta—-yn + QiPin + Ai(rsa)P42)n) 
j 


By (30.3) and (30.7) 


re at) = me CY 


iho as Sale 7) 


The expression in the square bracket is equal to 


 f2an ar ἜΑ acine (™ 
2| cos (“56 1 = —4sin (5) 
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Hence the eigenvalues are 
eS tte νος 
p Mle Ee em BE ( ) (30.8) 


They are negative and their magnitudes are of the order of the square 
of the frequency for an isolated pair of particles. 
The modes of vibration are specified by 


x = vy, exp (ta!/2) 


They represent a type of movement in which all particles of the chain 
oscillate synchronously with their amplitudes periodically returning 
to the same value after N/n links of the chain. The modes are very 
similar to the standing waves of acoustics. 

This deduction of the frequencies of a chain shows that the solu- 
tion of the characteristic equation can be avoided if the system is 
symmetric enough. 

Consider now a chain where the links have two different lengths 
which alternate regularly. The force constants also have two alter- 
nating values. The equations of motion for a chain of 2N particles 


are now 
ο΄ πιάϑχς (413 = Κα, γ — (k + ὐχῳ + hexaj 1 
md*Xo54.1/dt® = Κχς; — (k + K)x2;.4 + Kx2;..9 
where k and K are the force constants and j= 1,2... N. The 
periodic boundary condition is taken into account by defining 


X2v+1 = X, and χορ = Xey. The equations are of the form (30.2) if 
A is a matrix of 2N rows and columns and 


(30.9) 


Aaj, 25 = A254-19(2541) = —(k + K)/m 
4(2j;—1)2j = Agyoj~1) = K/m (30.10) 
254-1925 = ἄπαγε) = kK/m (Ξ 1... Ν) 


By the same arguments as before the components of the eigen- 
vectors can be identified, but not completely. There are two com- 
ponents associated with every value of /: 

V2j n = P(ZN)"? exp (ἢ 
2πιη() + 1/2 
tase = GN)" exp [ MI 1,2 


where n may have the values 1... Nand pandg are independent of ἢ 
but otherwise undetermined. To find the eigenvalues one solves 


Αν., = ρα 


| 
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and the following set of equations is obtained: 
—izn imn\| _ 

—plk + K+ on) + [ Kexp (=™") + καρ (F7)| =0 
(30.11) 

ἀν... πολ ἢ ΔῊΝ k+ K+ a,]=0 

p|k exp ( Ν ) + Κορ ("| g\k + nl 

In these equations a common factor (4)? exp (2zinj/N) has been 


divided out; the resulting equations (30.11) are independent of j. 
In order to solve them it is necessary to diagonalize the Hermitean 


matrix 
—inn Ἱπῇ 
[Κορ (=y") + exp (57) 


—(k + K) 


—(k + K) 


[ exp (7) + Kexp (**) 


Thus the eigenvalues of this matrix are the same as the eigenvalues 
of A and equal to 


3 1/2 
‘= ()-α + K)+ | x 4+ k® + 2kK cos (=) \ (30.12) 


Every value of n corresponds therefore to two eigenvalues. They are 
all either larger or smaller than —(k + K). Thus the natural fre- 
quencies of the chain have two sets of modes the frequencies of 
which are separated by a finite gap. ἘΣ ' 

A similar result is obtained for linear chains in which the force 
constants are uniform but the masses of the particles have two alter- 

ting values. 
nthe chains considered in this section can be regarded as one 
dimensional models for a monatomic crystal, for a crystal of di- 
atomic molecules and for a lattice consisting of two kinds of ions. 
In the latter case the two types of modes are the ‘acoustic’ and ‘opti- 
cal’ oscillations of the lattice. As far as the diatomic molecules are 
concerned the two branches do not admit such a simple interpreta- 
tion but might be roughly described as intermolecular and intra- 
molecular oscillations. kw : 

It can be noted that the mechanical oscillations of chains of 
particles are closely analogous to certain types of electric oscillations 
in communication lines. Consider as an example a number of coils in 
series, with a lead to earth, via a condenser, between every pair of 
coils. A line of this sort can perform electrical oscillations which are 
not too heavily damped. If the coils have equal inductances and the 
condensers equal capacities, the natural frequencies of the line are 
approximately defined by an equation of the form (30.8). In this case 
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the line admits low-frequency oscillations, filtering out frequencies | 
higher than a finite limit. If the two kinds of alternating condensers 
are used the line admits two separated bands of frequencies. 


31. Disordered chains 


If the masses of the particles or the force constants in the links of a 
chain are not uniform but distributed at random the high symmetry 
of the chain is lost and the methods of Section 30 are no longer 
applicable. In fact there is no general method for deriving the fre- 
quencies of the modes of vibration. A certain amount of insight into 
the dynamics of the chain can still be obtained from the spectral 
distribution, that is the number of modes within given intervals of 
frequencies. 

In this section a deduction of this distribution is given; it will be 
noted that even this limited result requires a greater mathematical 
effort than the complete dynamics of regular chains. 


The equations of motion are of the form (29.4) if the elements of 
the matrix A are 


ἄμ = —lkig—1 + kyg+ nlm) = a; 61.) 
αγ0-- = ἀᾳ--αὐ = Κι -αγίιμη. 1). 55 = ὃ, 

The masses m; and the force constants kjj—1) = kgy_; are arbi- 
trary positive numbers. It is not attempted to invoke the periodic 
boundary condition. All matrix elements outside the leading diagonal 
and the two adjacent diagonals are zero. 

The eigenvalues of A are real and it is assumed that they are 
negative; the movement of the chain then consists of stable oscilla- 
tions. The negative squares of their circular frequencies are equal to 
the eigenvalues of A. 

The number of modes where the square of the frequency is equal 
to or smaller than an arbitrary number μὲ is a positive integer not 
larger than N. If N is large enough this number can be replaced by 
a continuous function of μὲ which will be denoted by [δ(ω) du. 
The function under the integral sign is the distribution function of 
the modes on the frequency scale; D() dy is the number of modes 


in the infinitesimal interval of frequencics between mw and μ - dy. 
By definition, 


[Ῥω du = 


In attempting to derive an expression for the function D(u) in 
terms of the matrix elements an indirect approach is required, It 


(31.2) 
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involves the introduction of auxiliary quantities which do not admit 
any direct physical interpretation. 

In the first instance consider the matrix 

=-—-A+Al (31.3) 

where 4 is a parameter and I the unit matrix. Let the negative eigen- 
values of A be denoted by μη, which are, accordingly, positive 
numbers. 

Then 


N 
det P=] | (μι +4) 
i=1 
and 


log det P = ἋΣ log [μὲ + (1.2}} 
l 


In accordance with the assumed continuity of D() the last sum is 
replaced by an integral 


log det P = |. log [μ + (1/a)|D(w) ἀμ 


ἐκ ἢ ' λ 
The left-hand side of this equation is a function of the parameter 
and can be regarded as a ‘transform’ of the distribution function 
D(u), comparable with Fourier or Laplace transforms. If the left- 
hand side can be expressed in terms of the matrix clements of A and 
if the transformation can be inverted we have an explicit solution 
f the problem under consideration. 

Ἔ wey nest step in this direction the determinant on the left-hand 
side is scrutinized. Let P be the sub-matrix consisting of the rows 
and columns of AP which are labelled as j, 7+ 1,7/+2...N, 80 
that the larger values of j correspond to the smaller numbers of 
dimensions. 


λα, +1 Ab; 
Ab; hajs4 ss ale Ab; 


(31.4) 


when Aby-» λαν. 1 Ὁ} Aby-1 
Aby—1 hay + | 
According to the definition 
det P™ = det (AP) = ΔΚ det P (31.5) 


the expansion of det P“ according to (13.1) consists of two terms 
det ΡΟ) = (λα, + 1)P+)) — A2b3PU+?) (31.6) 
(31.6) can be used as a recurrence formula, by expressing the deter- 
minants in terms of 
det P™) = jay + | 


UAE hs 
det P4—) = (Aay + 1)(Aay—1 + 1) — A?by-1 9 
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These determinants can be related to certain continued fractions to | 


be denoted b ; 
defined by y I; where j can have the values 1, . . N. They are 


f= 1 + λα, — [4581 + dag, — ABR, ΠῚ + days 
— A*b2./... (31.8) 


the last denominator being equal to 1 +- λα Th . ᾿ 


A*h* 
an ta (31.9) 
and this is supplemented by 
Jn = 1+ Jay (31.10) 
2 
f _; = ] + λα; _1— aby 
, N-1 Κ᾿ Bie jan) 
According to (31.10) and (31.7) 
det P™) - fy; det POY—-1) = f,_-; det P™) (31.11) 
If there is any value of j for which 


| det PY-) — f_, det PO (31.12) 
it follows from (31.6) that 


det Ρύ-Ξ — (Aaj;_2 + 1) det PU-) — A2b?_. det P 
= [(Aaj~2 + 1) .-ἀ — A%b2_,] det ΡΟ) 


det Ρύ-ϑ — δ. ἢ. det ΡΟ) (31.13) 
By (31.11) equation (31.12) is valid for j= N and by (31.13) it 


follows that 
det P(V-2) — fread: Sa 
N 


det Ρύ) — 1 hi 


and by (31.9) 


and, by (31.5) 49 
y 
det P = A-" | [4 (31.14) 
The left-hand side of (31.4) is dtibtidingty 
log det P= —Nw + > log f (31.15) 
where w=loga : (31.16) 


a a ἜΈΟΝΝ 00 of (31.4) is expressed in terms of the 
matrix elements a;,. Finally it is necessary to express the functi 
D(4) in terms of the left-hand side of (31-4). : a 
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The right-hand side of (31.4) can be expressed in terms of a func- 
tion of w denoted by £2(w) and defined by the equation 


—Nw + | log [ue’ + 1] D(u) du = —Nw + Q(w) 
, =—Nw+> logf, (31.17) 
and thus Ν P 
dQ/dw = οἷ |. uDwl+pe-du (31.18) 


In order to express the left-hand side of this equation in terms of the 
function in the integrand another auxiliary quantity is required and 
defined as 


Rq@)= |  (dQ/dw) exp[—w4 + ἰΦ) ἄν͵. (31.19) 
By (31.18) 
R(q) = | exp (aw — ign] [{μ DEOL + wert" dp] ἀν 


Rearranging the second integrand gives 
exp (4w) exp (—igw)[I ++ exp ὧν + log #)]~ 
= [I + 67" {exp 2 — log )}} [exp (—igz +- ig log μ)} 
where z = w + log μι. It follows that 


fos) ου —igz 
-Ξ 1/2 ig log ἰὼ ὦ ; 
RQ) |. μ 3 D(u) € “dul 2 cosh 42 dz 


The last integral is the Fourier transform of 1/2 cosh 4z. It can be 
evaluated by contour integration. The contour consists of the real 
axis, a semicircle at infinity below the real axis and a set of small 
circles about the poles along the negative imaginary axis. The integral 
is reduced to a sum of residues arising from these poles and summa- 
tion results in z/cosh mq. The auxiliary function R is in this way 
related to the distribution function D(j) by a Fourier transformation. 


(2n8)-¥*(cosh πφ) R(q) = (22)-¥2 |" μϑν (yi) ee” d(log 1) 


—@® 


The distribution function is accordingly found by Fourier inversion 
D(u) = (1/2n*) |" (cosh xg)R(q)e-"" dg (31.20) 


This equation together with (31.19), (31.18) and (31.17) determines 
D(u) in terms of the matrix elements a;, which are the coefficients 
in the equations of motion. The integrals (31.19) and (31.20) con- 
verge sufficiently to admit numerical integration. Nevertheless the 
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procedure is cumbersome. The result can be used if the matrix | 


elements of A are determined only by way of some probability 
distribution. The statistical expectation of D() is easier to evaluate 
than the distribution function itself. 

The argument in the present section is characteristic for the 
peculiar problems arising if solid state theories are applied to dis- 
ordered structures such as liquids. 


32. Rectilinear molecules 


Oscillations in three dimensions are in principle fully determined by 
the same theory which was applied to movement in a single dimen- 
sion. In particular, if the masses and force constants are given, 
modes of vibration can be identified and the frequencies calculated. 
In practice calculations rapidly increase in complexity if the number 
of dimensions is increased. 

The main field in which the theory can be applied is the oscilla- 
tions of molecules; the results can frequently be tested by interpreting 
optical, infra-red and Raman spectra. 

In this section the oscillations are considered of triatomic molecules 
which have a rectilinear shape in equilibrium and in which the outer 
atoms are of the same kind. The most important examples for this 
are carbon dioxide and carbon disulphide. A simplified theory of 
these oscillations has been given in Section 29; in a more exact 
calculation such as that given here it is necessary to consider the 
interaction of the two outer atoms and to allow for movement in 
three dimensions. 

The instantaneous position of the three atoms is specified by nine 
displacements and there are nine components of the restoring forces. 
As in the previous case it will be assumed that every force component 
is a linear function of all the displacement components. They are 
accordingly elements of a nine-by-nine matrix B with eighty-one 
elements of which forty-five are independent. Simple, or‘ at least 
manageable, results can be expected only if, on account of the 
symmetry of the molecule, the number of independent matrix ele- 
ments is reduced. 

In one dimension restoring forces are antiparallel to the displace- 
ment by which they are generated. In three dimensions in general it is 
possible that displacements generate forces perpendicular to their 
own direction. If, however, the molecule is rectilinear and the axis 
of the molecule is an axis of cylindrical symmetry it is not possible 
that an axial displacement gives rise to a force perpendicular to the 
axis, and the restoring forces are antiparallel to the displacement. © 
_ Let the equilibrium positions of the atoms be aligned along the 
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x axis, and let the components of the displacements be denoted by 
X1» Vis 213 Xa» Vos Za} Xgy 78» Zp. The x component of the restoring 
forces can then depend on x;, x, and x; but on no other components 
of displacement. Similarly the y and z components of the forces can 
depend only on the y and z components of the displacements 
respectively. For this reason the equations of motion can be formu- 
lated for the x, y and z directions independently. 


md*x,/dt® = byyx + DyoX_ + θχυχᾳ 
Md?x,/dt® = by2X, + DeexXy + 59x (32.1) 
md?x,/dt? πα βι5ΧῚ + bosXe =} bysXz 


md*y,/dt* — Das Vi = basVo τ ἢ bie) 
_Mad2y/dt® = dysy1 + Dasa + Osey’s (32.2) 
' md*y,/dt? =. DygV1 + 6} + ἢ 44} 


md®z,/dt? — δηγξι ἢ bogZe -. θ.023 
Md?z./dt® = δ᾽αζι +- θβεζε + θροζῳ (32.3) 
mdzy/dt®? = byyzy + DgoZ2 + θρρῖς 


In these equations M is the mass of the atom labelled 2 and m is 
the mass of the atoms labelled 1 and 3. The matrix elements δ᾽ 
determine the restoring forces. Instead of the possible number of 
forty-five there are only eighteen independent matrix elements and 
the nine-by-nine matrix has been reduced to three three-by-three 
matrices. 

On account of the cylindrical symmetry of the molecule the 
matrices in equations (32.2) and (32.3) must be equal to each other. 
In this way the number of independent constants is reduced to 
twelve. As the atoms 1 and 3 are supposed to be of the same kind 
further simplifications arise: 

bi, = δ; δια = beg; Day = Doe; O45 = θὲς (32.4) 

As all forces in the molecule are internal forces they cannot give 
rise to any resultant force on the centre of mass or to a couple acting 
on the whole molecule. For these reasons the matrix elements 5;, 


are subject to additional restrictions. 
The resultant force has the components 


Ε, = (by, + δι, 4- By3)%1,+ (by. +- ba + Do3)X2 
+ (by3 + be3 + bg3)X3 (32.5) 

Fy = (by, + O45 + bys)V1 + (bys + B55 + Dse)¥e 
Ἔ (δὲς + B56 + De6)¥s (32.6) 


F, is determined by an equation of the form (32.6) with the z com- 
ponents of displacement substituted for the y components. 
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In obtaining expressions for the components of the resultant. 


couple the moment of the forces with respect to the centre of mass is 
taken. The centre of mass is, in the present example, in the position 
of the atom 2. The x component of the couple vanishes since the 
atoms are regarded as extensionless mass points. If the distance 
between neighbouring atoms in their equilibrium positions is 
denoted by D, the z component of the couple is equal to 


0= C,= Di(by — δον + (bas — bs)V¥2 + (bag — Bes)¥a)_ (32.7) 


A similar expression determines the component C, with the z 
components of displacements substituted for the y components in 
(32.7) 

The conditions F, = F,=0, C,=0 are satisfied by setting 
every coefficient of the displacements in (32.5) (32.6) and (32.7) 
equal to zero. Hence 


by + by + by, = 0 By, + bys + bsg = 0 
biz + bog + bog = 0 bys + δες + δὲς = 0 
biz + bos + ba, = 0 bug + bsg + Deg = 0 


(32.8) 
bys — big = 0 
bis — deg = 0 
big = beg = 0 


Not all of these additional restrictions are independent of each 
other. A matrix in which all elements comply with the above condi- 
tions can be expressed in terms of three positive constants, to be 
denoted by f, g and ἢ. A matrix similar to B and expressible in 
terms of these three constants and m and M, enters into the equations 
of motion which are of the form (29.4) with 


δι = μι Ἔχ, 5, = M-2x, 5, = m-"2x, 
87 = με 5.32) Sg = M-*/2z, So = m-1/3z, 


The matrix A consists of three sub-matrices in the rows and columns 


1-3, 4-6 and 7-9 respectively and denoted by A,, A, and A,. We 
have 


—f/m g/(mM)? (7 — g)/m 
A, =| g/(mM)? —2g/M g/(mM)*? 
(f—g)/m g/(mM)¥2 —f/m 


(32.9) 
—h/m h/2(mM)¥2 —h/m 
A, =A, = sim —h/m nae 
—h/m h/2(mM)¥“2 —h/m 
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Diagonalization of the matrix A, leads to a result not very different 
from that of Section 29. The eigenvalues are 


αι το; th--(£+£)+((£-£)'+(£) 


(E 4 x " 5) " (32.10) 


The characteristic equation of the matrix A, and A, is 
a? + a*h[(2/M) - (1/m)] = 0 
and has the solutions 
Oy = ἃς = ἀγ = ἀβ = 0; ἂς = Oy = —A[(2/M) + (1/m)] (32.11) 

Zero frequencies are associated with uniform displacements of all 
atoms in three directions and to rotations of the molecule about the 
y and z directions. Results differ from those of the simplified theory 
by the emergence of three rather than two different frequencies of 
oscillations one of which is twofold degenerate. The new frequency 
| ας |? corresponds to a parallel movement of the atoms 1 and 3 
conjointly with an antiparallel movement of the particle 2 perpen- 
dicular to the axis of the molecule. 

The theory can be applied to the oscillations of the CO, and the 
CS, molecules. Three different fundamental frequencies have been 
identified in the infra-red and Raman spectra. Denoting the fre- 
quencies by @, ὡς and ὡς and by applying equations (32.10) and 
(32.11) the constants ἢ, ἢ can be obtained in terms of the frequencies 
and the masses. 


Pac DL (: τὴ (1 ΤᾺ. )| (32.12) 
I 2. (atm * im) ΑΓ (8 + 08 
& = 4M(a; + οὗ — (M/m)f 
h = ἀπωϊ(2, ΜῈ + (1/m)] 
The ambiguity in the first equation is not serious. If the forces 
between the atoms | and 3 are neglected this must restore the validity 
of the simplified theory of Section 29. Then g = /; which is possible 
only if the negative sign is adopted. 
In the following table the experimental masses and frequencies 


| M | m | ῶς | Ws | We f | 8 h 
| 10°** g | 10** sec? 10° dyn cm™ 
ea ee =e wa 
CO, | 1994 | 26:56 256 442 125. 1.55. 1.43 | 0-114 
CS, 994. 5232 1235 281 749 | 0-747 0692 | 0-047 
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and the calculated force constants are shown. In order to render the. 
experimental test of the theory complete it would be necessary to find 
values for the force constants in an independent way, for instance by 
theoretical deduction. So far it can only be said that their order of 
magnitude fits well to the force constants derived from the spectra of 
diatomic molecules. 


33. Oscillations in three dimensions 

If the equilibrium configuration of a molecule is not rectilinear the 
mathematics becomes increasingly cumbersome although the method 
of the preceding sections remains applicable. These aspects of the 
theory which are of general applicability are nevertheless worth 
considering ; by their use the natural frequencies of specific molecules 
can be deduced to the extent that the necessary computations can 
be performed numerically by automatic computers. 

The theory is simplified if the equilibrium configuration of the 
molecule has a high degree of symmetry. Whereas this symmetry is 
usually reduced by the movement of the atoms the time average over 
the period of a mode of vibration has the full symmetry. On account 
of this it is sometimes possible to identify modes of vibration without 
solving the characteristic equation, or at least to reduce the order of 
the characteristic equation. This principle has been successfully used 
in Section 30. A systematic mathematical method for exploiting 
symmetry properties requires, however, the use of the theory of 
group representation and is outside the scope of this book. 

The equations of motion can be formulated in terms of the 
Cartesian components of the displacements. They are of the form 
(29.4). The vector 5 has 3N components. The (3N)? matrix elements 
of A must comply with conditions of the kind considered in (32.8). 
Usually the characteristic equation of the matrix has six solutions 
which are equal to zero. They express the fact that the interatomic 
forces cannot give rise to any resultant force or couple. In order to 
comply with this condition there are 18N equations between the 
matrix elements to be solved. This could in many instances be 
regarded as an unnecessary encumbrance. It can be avoided by the 
use of generalized coordinates the number of which can be smaller 
than the number of Cartesian displacements. 

The position of the particles is then specified in terms of variables 
(91, Y2 - - -) Which may be functions of any kind of the coordinates of 
the particles. There is no restriction concerning the admissible 
functions but in the equilibrium configuration all g; are to be zero. 
The number of these ‘generalized coordinates’ 4; is usually smaller 
than the number of Cartesian components of the displacement, 
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since they do not involve displacements or rotations of the whole 
molecule. In this way the mathematics is simplified at the cost of 
alternative complication. . : 

The change in time of the generalized coordinates is determined by 


Lagrange’s equations 
Ἢ atu (ay 4 
di\ aq; θη; 


where ᾧ, is the time derivative of g; and L is the Lagrange function 
equal to the difference of kinetic and potential energy 


L= Eyin — pot (33.2) 


in terms of the generalized coordinates and their time derivatives. 
The potential energy is again, apart from a constant term, equal 
to a quadratic form @°Aq where A is a real symmetric matrix and 4 
is a vector with the components 4;. The kinetic energy can also be 
written as a quadratic form 4°Gq. Here the matrix G is time inde- 
pendent and real symmetric. Its eigenvalues are positive (non-zero) 
but it is not necessarily a diagonal matrix. = , 
In order to solve the equations of motion it is necessary to dia- 
gonalize A and G simultaneously by a congruent transformation. 
According to the theory of Section 24 this requires the solution of 
the determinantal equation 


|A—yG|=0 (33.3) 
where y is a parameter. The roots of this equation are equal to the 
ratio of the diagonal elements of the diagonalized matrices 

γι = ἀμ δι 
where A’ and G’ are diagonal matrices. It is assumed that these 
quantities are negative. If T is the transformation matrix so that 


A’ = TAT the displacement q is transformed according to 
q = α΄ or 4 = q'T. In terms of the transformed quantities the 
equations of motion are transformed to 


d az) _oL' _9 
dt\ 0g; 0g; 


where L’ is the Lagrange function in terms of the transformed 
quantities. If both matrices are diagonalized this is 


(33.1) 


(33.4) 


9! 
gin + angi = 0 (33.5) 
The solutions of this equation are simple harmonic oscillations with 
circular frequencies equal to ωὗ = —ay/gy. Thus equation (33.3) 
leads directly to the frequencies. 
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In this method the force constants, or matrix elements of A, are | 


not subject to any restrictive conditions except those which are due 
to the symmetry of the molecule. Nevertheless conditions similar 
to (32.7) have to be taken into account in a different context. The 
kinetic energy is, at first, given as a function of the Cartesian velocity 
components for which eventually the g; are to be substituted. How- 
ever, the latter are fewer than the former. Thus it is necessary to 
restrict the Cartesian velocity components by constraints; they are 
of such kind that the total linear momentum and the total angular 
momentum of the molecules are to vanish. Then it is possible to 
express the kinetic energy in terms of the ᾧ; and the Lagrange func- 
tion in terms of the qg,; and 4;. 


Some insight into this method is provided by the example of the 
next section. 


34. Triangular molecules 


Consider again a triatomic molecule in which a central atom (2) 
of mass M is linked to atoms (1 and 3) each of mass m. These two are 
of the same kind and different from the central atom. Their equili- 
brium configuration is assumed to be an isosceles triangle. 

Let the plane of the equilibrium configuration be the x-y plane. 
Then it is not necessary to consider movement in direction of the 
Ζ axis since this will modify the movement in the x-y plane only by 
displacements and rotations of the whole molecule. 


If the angle at the vertex 2 is denoted by 8, generalized coordinates 
are defined as follows: 


4ι = (X3 — Xz) cos B — (jg — ye) sin B 
{4 = χὰ "τ Xy (34.1) 
9s = (ς — χε) Cos β + (ye — y) sin β 
These three degrees of freedom together with the three translational 
and three rotational degrees of freedom give a total of nine which 


is the correct number for a triatomic molecule. Equilibrium is to 


correspond to 4; = gz = (ς = 0. The changes of the angle β during 
oscillations are neglected. 


The potential energy of the molecule is given as a function of the 
generalized coordinates by 


Eno = G°Aq 


involving at first six independent matrix elements. They are subject 
to symmetry conditions which are considered at a later stage. 
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The total linear momentum and the total angular momentum of 
the molecule should vanish; the angular momentum is taken about 
the mid-point of the line connecting atoms 1 and 3. The conditions 
are 


m(X, + X3) + Mx, = 0 
μι + 79) + My, = 0 (34.2) 
m(y, — 31) cos B — Mx, sin B = 0 


The last equation could have been derived also by taking the angular 
momentum about the centre of mass. 

Equations (34.1) are to be differentiated with respect to the time; 
then the 4; on the left-hand side are replaced by ᾧ,. The Cartesian 
components of displacement on the right-hand side are replaced 
by the corresponding components of velocity. The resulting equa- 
tions, together with (34.2) determine the Cartesian components of 
velocity uniquely as linear functions of the q;. Although the gen- 
eralized coordinates are defined in purely geometrical terms the 
relation between the Cartesian and the generalized time derivatives 
(velocities) involves the masses m and M, as ad 

The resulting expressions are used for determining the kinetic 
energy in terms of the g; aS Εμῃ = 4°Gq. Finally the two matrices 
A and G are simultaneously diagonalized by solving a cubic equation 
of the form (33.3). This systematic procedure is elementary but 
unnecessarily cumbersome. In particular, the determinantal equa- 
tion could be solved by numerical methods only. An alternative 
approach is now considered. ' 

Simplifications are possible by appealing to molecular symmetry. 
In equilibrium the molecule has a plane of symmetry through atom 
2, perpendicular to its plane and bisecting the angle β. If the mole- 
cule is distorted the plane converts x, to —x, and X, to —X33 also 
X_ to —X2; y; to ys whereas yz is not affected. During the period of a 
mode of vibration the symmetry may be maintained throughout; 
alternatively every instantaneous asymmetric configuration must be 
matched by its mirror image during the same period at another 
instant. Symmetry is maintained if x, = — x; and x2 = 0; also if 
X, = χα = 0 and γ᾽ = ys. Asymmetry in a half period is compen- 
sated in the other half period if x, = xs. 

It is, accordingly, possible to define generalized coordinates which 
have the same symmetry properties as the modes of vibration: 


ry = Xy ᾿" 4 X3 
ro = Vi ot Va oh Ayo (34.3) 
rg = Xy + ἃς + Me 
where A and yu are numerical parameters. 
By differentiating equations (34.3) and taking account of (34.2) 
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the Cartesian velocity components are expressed in terms of the time | 


derivatives of the generalized coordinates. 

= Hh +> Fs) Vi = HF, — ἡ) 

lil —(m/M is Yo = (m/M)r, 

ἃς = ἐς -ὸὰ +s) Ys = 42 + ἡ tan B)* 
The kinetic energy is accordingly 


(34.4) 


Exin = al, ἘΣ + (ἡ + Fy)? + (ὖ, — Fy tan B)? 


+ (fa + fs tan B)? -ἰ- 4 ἢ + 45] 
= gm[r{ + 372 + 733 + tan? βὴ 
obtained as a sum of squares without any cross products so that the 
matrix G is diagonal. 

The potential energy must be the same for the molecule and its 
mirror image. Therefore it must not change if simultaneously 
x, and x, are replaced by —x, and —x, respectively or if —x, is 
substituted for x,. By expressing the a,,r;r, in terms of Cartesian 
displacements it is found that the factors of a,; and a3 do not satisfy 
ee aes so that a,;, and a,, and the parameter μὲ must 
vanish. 

The determinantal equation has accordingly the form 


ay, — (m/4)y ie 0 
Aye Ao, — (3m/4)y 0 = 0 
0 0 33 — (m/4)(3 + tan? β) 


The cubic equation is thus reduced to a linear and a quadratic 
equation. The solutions are 


γι = (4/m)agg(3 + tan? β)"" 


72 fda + aa) + Πέθαιι — αμὴ + 38 


If a1, G9 and dg; are negative, 7;, 72 and γς determine the frequencies 
of the oscillations. 
_ There are numerous molecules of this kind, water being the most 
important. However, as there are only three frequencies and four 
constants to be fitted a conclusive quantitative test of the theory has 
not yet been carried out. Otherwise the experimental data fit well to 
the theoretical results. 

Spectra of molecules with a greater number of atoms can be 
analysed if they have a sufficiently high degree of symmetry. 


CHAPTER 9 


Invariance, Vectors AND TENsoRs 
in NATURAL SPACE 


35. Space vectors 


It is assumed that readers are familiar with the elementary vector 
methods which are usually included in courses of physics. In the 
present chapter it is intended to establish connections between the 
elementary properties of vectors with principles of invariance. 
Repetition of some elementary matters cannot be avoided but the 
present chapter is intended to be a supplement to the elementary 
approach rather than a substitute. 

The extensive use of vector methods in physics is of recent origin. 
Vector notation in Maxwell’s treatise on electromagnetic theory was 
regarded as an innovation comparable with the theory itself. It was 
not readily followed up. Vector methods were suspected of being 
incorrect; at the best they were regarded as a kind of shorthand that 
could be dispensed with. By now the critical objections have been 
disproved, but many physicists still regard brevity as the only merit 
of vector methods. 

If an equation of physics claims general validity and is formulated 
in terms of some specific set of coordinates, its formulation in terms 
of alternative coordinates is not known automatically but requires 
explanation. However, if the equation is presented in terms of vector 
symbols no further explanation is required. Vector symbols are not 
tied to any particular set of coordinates but have an unambiguous 
meaning whatever coordinates are employed. 

Consider first Cartesian coordinates with the origin fixed. Alter- 
native sets are derived from each other by rotation, 1.6. by changing 
the direction of the axes while preserving their orthogonality, It is 
known that equations expressing fundamental laws of physics have 
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the same form before and after the rotation. In technical terms, the | 


laws of physics are invariant with respect to rotations of the Cartesian 
axes, or briefly, natural space is isotropic. Vector symbols and the 
rules for handling them must be introduced in such a way that this 
isotropy is taken into account. 

Space vectors are closely related to abstract vectors in three 
dimensions but definitions and notations are slightly modified. In 
accordance with Section 1 vectors are defined in terms of three 
numerical quantities (components). This is not stringent enough 
since space vectors must be endowed with the properties of magnitude 
and direction. Direction could be readily defined with respect to some 
particular set of coordinates but this is to be avoided. Therefore an 
indirect definition is required. Direction and magnitude are defined 
by stating in which way the components have to change when the 
axes are rotated. 

The Cartesian coordinates of a point (χί, xj and x) can be regarded 
as the components of a ‘position vector’ denoted by r’. Rotation of 
the axes is expressed by an orthogonal transformation of the position 
vector 

r= Ur" (35.1) 
where U is a real orthogonal matrix which determines the new co- 
ordinates or components of the transformed position vector σύ. 

Space vectors are now introduced by the following definition. 


DEFINITION (35.A) Space vectors are defined as a set of three 
Cartesian components (a}, a; and a) which are real numbers and 
subject to the condition that a rotation of the coordinates of the 
form (35.1) determines new components according to 


a’ = Ua” (35.2) 
where the matrix U is the same in (35.1) and (35.2). 

According to this definition space vectors can be added and 
multiplied by scalars in accordance with the rules of Section 1. 
Equality of two vectors which is formulated in terms of some parti- 
cular set of coordinates persists after transformation, since the same 
transformation rule is applied to both vectors. Thus in formulating 
any law of physics it is no longer necessary to distinguish between 
a’, a’’ or any other vector derived from a’ by orthogonal transforma- 
tions. Instead a vector symbol a is used which is not tied to any 
specified set of components. Therefore every equation which is 
formulated in terms of vector symbols is automatically invariant with 
respect to rotations of the axes. This is the reason why equations of 
physics are preferably formulated in terms of vectors. 

Displacement of the origin of the coordinates has the effect of 
adding the displacement r, to every position vector whereas the 
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components of non-localized vectors are not affected. The funda- 

mental equations of physics are invariant with respect to these dis- 
lacements. ) ai 

ὶ The present approach to vectors is readily extended to entities of 

greater complexity, such as tensors and space-time vectors. 


36. Products of vectors: Tensors 


In elementary vector calculus the multiplication of vectors is intro- 
duced by defining the scalar product and the vector product of two 
vectors. There is an alternative multiplication of vectors which is now 
considered in some detail. An 
Let a’ and δ΄ be two space vectors.* The product a’b’ is defined in 
a way comparable to the direct product of matrices. The product 
a'b’ is called a dyadic and is expressed in the form of a three-by- 
three matrix the elements of which are the products of the vector 
components: ᾽ TOS a 
ab, αἰδι aly 
α΄ = | ab, ashy ας (36.1) 
sb; αἱδε sb 
᾿ ' : le 
The product of linear momentum and linear velocity of a molecu 
in the kinetic theory of gases is an example of a dyadic. The statistical 
expectation of the components of the dyadic are interpreted as the 


flow of momentum. 
In accordance with the transformation rule (35.2) 


a’ ae Ua" δ΄ = ν΄ _— p’U-! 
the dyadic} is transformed as 
a'b’ --υαν υ-: (36.2) 


This is the familiar transformation rule for matrices. Once this 
transformation rule for dyadics has been established it is admissible 
to discard the reference to any set of coordinates and define dyadics 
in terms of vector symbols denoting them by ab or simply ab. 

The transposed dyadic is denoted by 


(ab) = ba (36.3) 


* There is no need here to distinguish between row and column vectors. 
, it helps in clarifying some of the deductions. 
ne Tbe pceasinn a has φρο τον in the early chapters of this book for denoting 
scalar products of abstract vectors. In accordance with prevailing conventions an 
alternative notation is used for space vectors. Thus the scalar product will be 
denoted by p+ q and dyadics by pq. 
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and usually differs from (ab). It is convenient to split dyadics into | 


a symmetric and a skew-symmetric term: 


ab = Hab + (ab)] + {ab — (ab)] (36.4) 


Denoting the second term by G it is readily seen that G = —G and 
that this relation is orthogonally invariant. The diagonal elements of 
G must accordingly be zero. In order to derive the transformation 
rules for the non-diagonal elements it is necessary to consider some 
matrix σ΄ which is defined in terms of a set of coordinates. Then the 
transformed matrix elements are obtained as 
823 = S09(Waatlag — Ugaltos) + £is(Uratls3 — Uglty3) -ἰ- S12(Ur2llog — Uygtlyg) 
815 = 82g(WorMgg — UgiMag) + S1s(Urillg3 — Ugyttyg) + Si2(Ui1Meg — Uoittyg) 
S12 = So9(Uaitg2 — Ugitlag) + Sis(Uilg2 — Ugittye) + S12(Ur Mee — γι) 
The coefficients in this transformation are equal to the cofactors of 
det U; as U is real-orthogonal the cofactors are, by (20.9), equal to 
the matrix elements to which they are adjugated. Taking account of 
the definitions in Section 20 it follows that 
823 = Bost — Zigler + Zils 
813 = —Sastlie + Liglos — Liotlse 
815 = Sastlis — Zigllo3 + Li2llss 
It appears that the transformation of these matrix elements is the 
same as the transformation of a vector with the components 


Υ ’ ͵ ͵ 7 
C= $823 “= —Zis ὦ = Lie 


(36.5) 


(36.6) 
according to 


ε΄ ἐξ ἕῦ, ce’ a Ue’ 
Thus under a transformation the skew-symmetric part of a 
dyadic can be replaced by a vector. Since vectors are defined in 


terms of their behaviour under transformations it may be said that 
the skew-symmetric part of a dyadic is a vector. The components of 


the vector ς΄ are 

οἱ = Haibs — 036%) 

ὡς = Hasb, — ajbs) 

ᾧ = (ατδᾳ — ἀφ) 
Thus a vector symbol can be used for the skew-symmetric part of the 
dyadic; in accordance with current convention it is written as half 
the vector product of a and b 
. c= }(a x b) (36.7) 
The symmetric part of a dyadic cannot be reduced to a vector. It has 


nine matrix elements of which six are independent. According to the 
transformation rule (36.2) and by (21.E) the property of being sym- 
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metric is preserved under orthogonal transformations. The trace of 
the symmetric part is equal to the trace of the dyadic itself ‘a by 
scalar. It is equal to the scalar product a* b of the vectors a and Ὁ. 
Otherwise there is little to comment on the properties of the sym- 
τ Whetienthe use of dyadics is limited there are important entities 
in physics which have nine components. Every component 15 associ- 
ated with a pair of Cartesian coordinates. If the components ἣν 
regarded as matrix elements they obey the transformation rule 
(36.2). They are called tensors of the second order or, if no ambiguity 
ises, simply tensors. 
"i "οδηρεκιλθροκωων tensors can, like skew-symmetric dyadics, be con- 
sidered as vectors (although not necessarily a vector product). 
Symmetric tensors have little in common with vectors since they are 
associated with two directions rather than a single one. As an 
example consider the stresses in an elastic body from which the 
term ‘tensor’ is derived. The stress is a force acting on an (possibly 
infinitesimal) area. It depends upon the direction of the force and 
upon the direction of the normal to the area. 

The trace of a tensor is a scalar (although not necessarily a scalar 
product). The product of a tensor and a vector is denoted by a dot 
and results ina vector. Thus 4+ Β and Ὁ " fare vectors. The product 
of two vectors and a tensor is written with two dots 8." Bc and is 
a scalar. The product of two tensors according to the rule of mak 
multiplication is written by a dot, A+ B and results in a bea T Ἢ 
trace of a product of two tensors is written with two dots A : Ban 
is a scalar. 

Ἶ Given a symmetric tensor A’ it is always possible to find a set 
of Cartesian coordinates with respect to which the non-diagonal 
nts of Α΄΄ vanish. 
cae as defined in the preceding section and tensors as defined 
in this section are not localized. Localized vectors and tensors are 
connected with some particular value of the position vector. This 
may be a single point in space, a continuous distribution or anything 
between these extremes. Thus localized vectors and tensors are 
functions of the position vector although not necessarily analytic 
ions. 
Whereas vectors are readily visualized as being of definite length 
and orientation there is no simple and general geometrical model that 
applies to tensors. However, if a tensor B is determined by the way of 
a linear relation between vectors p and q 


q=B:p (36.8) 


a geometrical representation of the tensor can be given. Assuming 
that the eigenvalues of the tensor are positive let their reciprocals be 


Η 
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the semi-axes of an ellipsoid. The centre of the ellipsoid is to coincide 
with the origin and the direction of the axes is to be the same as the 
direction of the eigenvectors. If a radius vector in the direction of Ρ 
is plotted from the origin it marks a point on the surface of the 
ellipsoid. Consider the tangential plane through that point and the 
perpendicular distance of that plane from the origin. The direction 
of this perpendicular distance vector is equal to the direction of the 
vector q. It is not necessary to present the proof in this book because 
the geometrical construction is of little use and equations (36.8) are 
easy to solve. 

An example of a symmetric tensor is provided by the moments of 
inertia of rigid bodies. Let J be a tensor with the Cartesian com- 
ponents 


Jin = dn. — 1) { ρύλνγι de (36.9) 


Here r; and r, are the Cartesian coordinates of any point in the 
interior of the body and r its position vector. p is the density and dr 
is the volume element. The integral is to be extended over the whole 
volume of the body; 5,;, are the components of the unit matrix. The 
eigenvectors of J are called principal axes of inertia. 

J is localized at the centre of the body and is of importance in the 
dynamics of rigid bodies. If L is the angular momentum localized at 
the centre of mass and w is the angular velocity, localized at any 
point inside the rigid body, these vectors are related by 

L=J-w (36.10) 

The rotational movement of the bodies is determined by (36.10) 

together with 
dL/dt = C (36.11) 
where C is the couple of external forces at the centre of mass. 

Even a superficial knowledge of spinning tops shows that the 
solution of these equations is in general of a complex nature. A 
simple type of movement persists only in the absence of external 
couples and on condition that the initial angular velocity has the 
same direction as a principal axis of inertia. Then L and ὦ are 


parallel and the velocity stays constant, both in magnitude and 
direction. 


37. Fields 


Localized scalars, vectors or tensors which are continuously distri- 
buted in space are called fields. Scalar, vector and tensor fields may 
be mutually related by differentiation with respect to the coordinates. 

If a scalar a is given as a function of the Cartesian coordinates (or 
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of the position vector) it represents a scalar field and is mathe- 
matically specified as a(r’). The three partial differential coefficients 
δα [ὃχ!, 0a/Ox, and 0a/0x; are components of a vector, to be denoted 
by V’a. . 
Tn proving this assertion the transformation rule is rewritten as 
εὐ" x oe Late (37.1) 
Us, = Ox5/OX;, τη = Ox, /Ox; 
The partial differentiation in the first equation (37.1) is performed 
with two of the doubly primed coordinates being held constant; the 
partial differentiation in the second equation is performed with two 
of the singly primed coordinates being held constant. 
Since U is an orthogonal matrix it follows that 


Ox;/Ox;, = ὄχι /Ox; (37.2) 
The differential coefficients of a with respect to the two sets of 
coordinates are related by 
7 da ἢ 
a5 τ Σ (κα) δι 
and, by (37.1) and (37.2) this is 
da da \ (Ox; ) da 
ax; 2, (ax) (ix! " "OX, 
It follows then that 
V'a = UV"a (37.3) 
Thus it is shown that V’a obeys the transformation rule (35.2) and 
is, accordingly, a vector corresponding to the vector symbol Va. As 
the above deduction applies to the differential operator V as well as 
to the differential coefficients, the operator V itself is a vector symbol. 
Alternative notations for this operator are grad and @/dr. It is seen 
that this operator can be employed for deriving a vector field from 
a scalar field. : [ ' 
Once it is understood that V is a vector it follows, virtually without 
any proof, that the corresponding differentiations of vector fields can 
be considered as products of the vector into some other vector. In 
particular if b(r) represents a vector field then Vb (or 0b/0r) represents 


a tensor field. If the transpose is denoted by (Vb) the tensor is re- 
solved into skew-symmetric and symmetric components. 

Vb = Ὁ — (Vb)] + 4[Vb + (Vb)] (37.4) 
The first term represents, in accordance with (36.7), a vector field. Its 
components are | 


Lf _ Sh), (Afi) ἐδιε.., δ 
Hts @x,/’ 2\@x, ὄχι) 2\ax, ax 
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In conventional notation they are the components of 4 (curl b). The 
second term in (37.4) is a symmetric tensor; its trace is a scalar. This 
trace is called the divergence of b and is written as 


divb, Ν᾽" or (0/dr)*b 
Of the second order differential operators the most important is 


Laplace’s operator. It is defined as V+ V or (0/dr)+(0/dr) and 
usually denoted by V®. It is a scalar and, hence 


Va = 0°a/Oxi + 0°a/0x? + 0°a/dx2 
is also a scalar. If C represents a tensor field the expression V- C 
(where V is considered to be a row vector) represents a vector field 
which may be called the divergence of the tensor. 
An important example is the vector field 
γον» 
which has the components 
Ἂ δ5}, ,ὃχξ 
j 
As they are similar to the Laplace operator some authors use the 
notation V*b. This similarity is, however, restricted to Cartesian 
components.—By performing the differentiations in coordinates the 
identity is established 


V-Vb= V(V +b) — V x (V x b) = grad div b — curl curlb 


38. Strain and stress 


Important examples of tensor fields are provided by the mechanics 
of continuous matter. Consider in particular an elastic body and let 
the origin of coordinates be placed at the centre of mass. Let the 
particles of which the body consists be moved over small distances 
and assume that the local displacement (u) can be expressed as a 
function of the position vector (r): 


u = u(r) 
u(O) = O 
This represents a vector field. It can be expanded as 
u = (0u/dr) +r + O(r?) (38.1) 


The tensor 0u/0r is split into a skew-symmetric and a symmetric part; 
the former is equal to 4 curl u. If it were uniform it would describe 
a rotation of the body about the centre of mass. In general both the 
symmetric and the skew-symmetric parts are functions of the position. 
The symmetric part of the tensor 


1-14 8) os. 
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has the components 


_ Ou; TOs τ eu;! πθ 8 
e335 = Ox, Ex, = ἔμ = iz + δ 
and is called the strain tensor. Its trace 


διὰ + €99 + &3 = Veu 


is equal to the local increment in specific volume divided by the 
initial specific volume. The tensor 


y=—e-—H(tre)l (38.3) 


is a measure for the local change in shape after the change in volume 
has been taken into account. The strain tensor is dimensionless and, 
as all displacements are assumed to be small, the strain components 
are small compared with unity. 

Thus the vector field of displacements can be expressed in terms of 
the vector field of local rotation and the tensor field of strain. 

The forces in an elastic body are expressed in terms of another 
tensor, that is the stress tensor σ, It determines forces per unit of area, 
acting on any infinitesimal area in the interior or on the surface of the 
elastic body. The specific properties are due to the fact that the force 
and its reaction attack at the same point. Thus the direction in which 
the force acts is reversed by 180° if the area is approached from the 
opposite direction. Every component of stress is associated with two 
directions in space, the direction of the normal on the infinitesimal 
area and the direction of the force acting on it. The component 04, 
for example is equal to the component of force in the x, direction 
acting on an infinitesimal area with a normal in the positive x, 
direction. The trace of the stress tensor is equal to (—3) times the 
isotropic pressure (also called hydrostatic pressure). The tensor 


τ--σ-- (σὴ! (38.4) 


determines the excess of the stress components over the local 
negative pressure. These components depend on the direction but are 
not vectors. 

In the theory of elasticity, in particular that of elastic equilibrium, 
one has to determine the distribution of stresses if the local displace- 
ments are given or vice versa. In either case it is necessary to start 
with some general relation between the forces and the displacements. 
A relation of this kind is known as Hooke’s law, according to which 
local stresses are independent of the local curl of the displacement 
and linear functions of the local strain. If the assumption is added 
that the material of which the body consists is isotropic the form of 
the strain-stress relation is completely determined. 

As the traces of the strain and the stress tensors are scalars their 
mutual dependence must be independent of the other components. 
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Hence the decrease in the isotropic pressure must be proportional to 
the relative increment in specific volume. 


The tensors Ὑ and t must also be proportional to each other: 
(1/3)tro = Ktre 

[o — (1/3)(tr o)l] = 2s[e — (1/3)(tr €)I] 

The constants Κα and s are independent of each other and have to be 

taken from experiment. They are known as bulk modulus and shear 

modulus respectively. In anisotropic materials the equations (38.5) 

would have a more complex form and involve a larger number of 
constants. 

Although equations (38.5) are correct they are unsuitable for 
application since they involve seven scalar equations whereas there 
are only six unknown quantities which may be components of the 
strain or the stress tensor. The number of equations is, however, 


readily reduced. If the first equation is multiplied by 1 and added to 
the second equation we obtain 


σ = 2se + (K — 4s)(tr e)I (38.6) 
as a complete expression for Hooke’s law. 

It may be desirable to employ alternative elastic constants which 
are Closer related to measurements. For this purpose consider the 
dilatation of a fibre of circular or square cross section. Let the Χς axis 
coincide with the axis of the fibre; let the fibre be stretched without 
any forces acting on the lateral surfaces. Then oj, = σῷ = 0; also 
all non-diagonal components of the strain and stress tensor are zero 
and εἰ) = δῷ by reason of symmetry. 

Then equations (38.6) reduce to 


(2K  35)ε, + (K — 38)ε(ς = 


(38.5) 


(2K — 35)ε:, + (K + 45)ejg = σις en) 
By eliminating ¢,, it follows that 
ag etal ose (38.8) 
3K +s ; 


and from the first equation (38.1) it follows that 


eu = |e δ 38.9 

2(5κ + 53)} 5 (38.9) 

Denoting the bracket expressions in (38.8) and (38.9) by Ε (Young’s 

modulus) and y (Poisson’s ratio) respectively and eee K ate 8 

in aap of these two new constants, Hooke’s law can be rewritten 
in the form 


c=. Ξ le it τ-- 5 (tr Ἢ (38.10) 


INVARIANCE, VECTORS AND TENSORS 93 


Elastic equilibrium is determined by the relation 
V-o+F=0 (38.11) 


where F is the force per unit volume (‘body force’) acting on the 
elastic body. The most important body force is gravity. Substitution 
of (38.2) into (38.10) and (38.11) results in 


1+» 


This set of simultaneous equations determines the field of dis- 
placements if the field of forces and appropriate boundary conditions 
are given. By substituting the product of density and acceleration, 
i.e. pd*u/dt* on the right-hand side of (38.12) equations are obtained 
which determine the propagation of elastic waves. 


ue Bg [v= vu + την τ] φῶ CRD 


39. Spherical polar coordinates 


Problems in field theories frequently require the use of curvilinear, in 
particular spherical polar, coordinates. The fundamental equations 
of physics are not affected by rotations of the Cartesian axes but they 
change their form if rectilinear coordinates are replaced by curvi- 
linear ones. Nevertheless the formalism of vector calculus applies 
whatever coordinates are used provided that the vector symbols 
admit an unambiguous interpretation. As far as spherical polar co- 
ordinates are concerned the interpretation of vector symbols is based 
on the fact that within small volumes the local set of polar coordinates 
differs from the Cartesian set merely by the orientation of the axes. 

Spherical polar coordinates are denoted by r, ὃ and ¢ and are 
defined by a non-linear transformation 


x,=rsin?cos ¢ 
x, =rsin?sin d (39.1) 
x; = rcos? 


Surfaces of constant r are spherical; ? (varying from 0 to πὴ) deter- 
mines the latitude and ¢ (varying from Ὁ to 27) determines the 
longitude on spherical surfaces. r can vary from 0 to οὐ. The 
volume element is 


dx,dx,dx, = r* sin ὃ dr dd ἀφ 
Let position vectors in this section be denoted by s. ‘Components’ 
of this vector would not have any simple meaning. However, the 


infinitesimal increments ds still obey the transformation rule (35.2). 
By differentiating (39.1) it appears that 


ds’ = U ds” (39.2) 
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Where here and subsequently the single prime refers to Cartesiah 
coordinates and the double prime to the curvilinear set. The trans- 
formation matrix is orthogonal. 


sin?cosd cosicosd —sin# sin φ 
υ -- se dsind cos#sind sin ? cos Ί (39.3) 
cos ϑ —sin ὃ 0 
Hence 
ds"? = ds"? = ἀν" + dst’? + ds}’3 
where 


ds’ = dr; ds’ =rdd; dsj’ =rsinddd (394) 


Curvilinear components of vectors (other than position vectors) are 
derived from their Cartesian components by the transformation 
inverse to (39.2). The same transformation matrix is used for trans- 
forming Cartesian tensor components in accordance with the rule 
(36.2). The r, ὃ and ¢ components of vectors will be denoted by 
1, 2 and 3 respectively. 

Once this connection between the Cartesian and curvilinear com- 
ponents of vectors and tensors is established the interpretation of 
vector symbols follows almost automatically. Multiplication rules 
for forming the dyadic, vector and scalar products are the same as 
before, if Cartesian are replaced by curvilinear components. 

The vector field Va can be interpreted by transforming its Carte- 
sian components. Its direct definition implies that the limit As = 0 is 
taken of Aa/As. Thus 

" ὃ ‘ | Ι ὃ ᾳ "1 1 ὃ 
κι ῳ Or’ * PP OB’ ae r sin? dd 
Differentiation of vector fields requires some care. Spatial derivatives 
must take account not only of changes in magnitude and direction 
but also of location. Even if the vector field were uniform the curvi- 
linear components would vary from point to point on account of the 
changing directions of the meridians and parallels. Therefore the 
components of (Vb)’’ are not equal to the result of operating with 
V” on the vector δ΄. 

It is necessary to express at first the Cartesian vectors V’ and b’ in 

terms of their curvilinear components: 


V’= UV” ν΄- υ-: 
From these expressions the Cartesian tensor 
Vb’ = WV"b"U-) 
is obtained. Here the bracket indicates that γ΄ operates on the 
product of the two and only two following quantities. Having 


(39.5) 
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obtained the Cartesian expression it is now transformed back to 
polar coordinates. 


aor δ)" --υπυσῦν UU = (V"b"U-)U 39.6) 


i ἱ ts all 
By this method of transforming the Cartesian componen 
τὰς ον. of vector algebra and calculus can be interpreted in terms of 
spherical polar coordinates. It is not necessary to follow this up. In 
conclusion the most important formulae are compiled. 


on’ , 0b,’ Ob; 


1 peas ΑΝ 
‘Or or or 


γάμο τὰν δ ἀφ 9 STG, 
( - by) (3 +6) r ὃ 


r 


(39.7) 


le bY A a CY δις. 
Annes nal Od (ῳ ὥϑ δ᾿ 
-- ΝΜ — bi! cos 0) + ΟΣ τ - 5) 
curl, b’ = ΞΕ -- το sin δ) 
curl, b” --ἰ rs) “οὐδε “ἢ (39.9) 
vy = 128, _ ae 
am one sy pitas ἊΣ (39.10) 


ther t of orthogonal curvilinear coordinates than 
Peripher decived from the Cartesian by an ant 
gonal transformation. The most important of these are the i rical, 
paraboloidal and ellipsoidal coordinates. The interpretation οἱ eae 
symbols in terms of these coordinates proceeds in close analogy to 
the arguments of the present section. 


40. Space-time vectors 


bject of critical 
If the isotropy of natural space should become the objec 
scrutiny it eal be argued that the phenomena of physics are sie 
patible with the use of vector and tensor methods. In particular a 
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observable quantities are scalars, vectors or tensors. This is sufficient 
for establishing isotropy and could be refuted only by exhibiting 
phenomena which could not be specified in those terms. 

In particular in the theory of relativity a generalization of vector 
and tensor methods has indeed played an important part. The theory 
of relativity claims ex hypothesi that the equations of physics are to be 
invariant with respect to certain transformations which are to some 
extent similar to the rotations of Cartesian axes. On account of this 
similarity additional types of vectors and tensors have been intro- 
duced to physics. If the equations of physics are formulated in terms 
of the generalized vector symbols their postulated invariance is 
automatically established. In the present section this branch of vector 
methods is surveyed to that extent as it is used in special relativity. It 
is assumed that readers are familiar with the essential features of the 
theory. 

The equations of Newtonian mechanics are invariant with respect 
to transformations of the Cartesian coordinates which include not 
only the rotation of the axes but also time-dependent displacements 
of the origin: 


τ΄ = Ur’ -- γέ (40.1) 


Here U can be any real orthogonal matrix and vis ἃ constant velocity 
determining the relative movement of the two origins; ¢ is the time. 
This relation between two frames of reference is called a Galileo 
transformation. If the axes in the two sets have the same directions 
the transformation simplifies to 


= — Ot x =m; xy = 3h; m= =0 (40.2) 
The corresponding rule for transforming velocities (u) is 


” , ᾿ | ΞΒΒΕΒ  Ι͂Ξ ΕΞ. ἢ , 
uy =u — vd; Ug = Ug; Us = Us 


(40.3) 


In the special theory of relativity a different transformation rule 
applies. In contrast to (40.1) it involves not only the position vector 
but also the time. In formulating the transformation rule it is con- 
venient to introduce vectors of four dimensions. The particular form 
of these vectors is then chosen in such a way that the resulting 
formulae reduce to those of ordinary Newtonian mechanics in the 
limit v < ς (ὁ being the speed of light in vacuo). Thus a ‘four-vector 
x’’ is defined in such a way that the components xj, x}, xi are 
Cartesian coordinates and x’ — ict’, where t’ is the time measured 
with respect to a particular frame of reference. The transformation 
rule is then the ‘Lorentz transformation’ 


x = Vx 
Vic WV, Ρ 


(40.4) 


where (40.5) 
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with 


the three-by-three sub-matrix U being real orthogonal. 
υἱ υγῦς Vy 
ah eR [ fi 67.’ ViVq Us U—vs 
κ υγῦᾳ Vv, VF 
S|. we 
0 0 0 v1 
i PEEP 0 OQ eee) ὡς 
e=71-()| OTe pg to geancg 
> ᾿ς πὰς —ic 
The matrix V is complex-orthogonal. For most purposes it is not 
necessary to consider the transformation in its most general form. It 


is sufficient to put U = land v, = v, = 0, vy = v. Then the special 
transformation, corresponding to (40.2), is determined by the matrix 


ΛΝ δέ 1 0 0 i(v/c) 
Mien [ - (<) | 0 [1 — (v/c)}”2 0 0 
0 0 [l —(v/c))}"2 0 


—i(v/c) 0 0 1 
(40.6) 
The corresponding transformation of velocities differs from the non- 
relativistic relation (40.3) and is given by 
me uy v0 
1 (u/c?) 


yr 12] — (v/c)*}/* 


ooo ὦ 


= | 40.7 
ne 1 — (yo/e?) nm 
rn 14} — (v/c)?}* 
a: erty = (u,v/c?) 


In contrast to (40.3) the velocity components perpendicular to the 
relative velocity of the two frames of reference are affected by the 
transformation. νὰ eran 
The magnitude of the position vector is invariant: 
x’2 sin —c24'2 "ἢν x2 4 x'2 =} x2. x’? ue x""2 
p* 
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Four-vectors are now defined as four-component abstract vectors 
which comply with the transformation rule (40.4), such as laid down 
for ‘position vectors’ x. According to the special theory of relativity 
all equations should be invariant with respect to Lorentz trans- 
formations. This can be demonstrated by formulating them in terms 
of four-vector symbols, or in terms of tensors derived from these 
vectors by the methods of Section 36. Whereas vector formalism was 
a matter of convenience as far as natural space is concerned, it is here 
an important method for testing the consistency of the space-time 
theories. 

The electromagnetic theory according to Maxwell and Lorentz has 
the property of being invariant with respect to Lorentz transforma- 
tions. It is merely necessary to apply a suitable notation for demon- 
strating this invariance by the use of vector notation. Thus a four- 
vector A’ is defined; the first three components are supposed to be 
equal to the components of the vector potential; the fourth com- 
ponent is equal to the scalar potential. Another four-vector J’ has 
three components equal to the components of current density, the 
fourth component being equal to the density of charge multiplied 
by c. Denoting the d’Alembert operator 


a 83 0? 
at” ag Oe ἰδ: 


by CL) the equation determining the propagation of electromagnetic 
waves becomes 


(4nJ,/c)-OA,=0 (n=1...4) 


where the first term vanishes in the absence of matter. Similarly 
other relations of electromagnetic theory admit formulation in terms 
of four-vectors. 

Newtonian mechanics is not compatible with the claims of Lorentz 
invariance. The essential innovations due to special relativity were 
accordingly the changes applied to the equations of mechanics. In the 
mechanical equations of motion the time plays the part of an inde- 
pendent variable whereas the positions of particles are dependent 
variables. This approach is incompatible with Lorentz invariance. 
Thus another independent variable is employed. The differential ds, 
defined by 


ἅς ἘΝ ay (See tee) 
2 οΣ 
is called the differential of invariant time and is a scalar. It is em- 


ployed as the independent variable in particle mechanics. The sub- 
stitution of ds for dt as the independent variable is virtually sufficient 
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for introducing the necessary changes in the laws of mechanics. If the 
Newtonian velocity is defined as 
Ul ’ , 
’ ; dx dx 
u esd Uo = ἀντ Ὸς 4 = ese | 
dt di di 
uw? y+ u's + μὲ 


it is readily ascertained that 


a (a) στ} 
ds dt ¢ 
A velocity four-vector is now defined as 
_ ax 
ds 


Its components are 


, —1/2 
g=ufi-(E)P" G=1.29 
a ne or u’\2)]-12 
ἐτ τάν. (Ὁ 


For a mass point of mass m the product mq plays the part of momen- 
tum and the equations of motion are: 


d ‘\ _ τ 
(47 = F (40.9) 


(40.8) 


The four-vector on the right-hand side has three components repre- 
senting the force in accordance with its meaning in experiments. The 
fourth component has actually to be defined in such a way that equa- 
tion (40.9) is satisfied. The mass m is a scalar and a constant of the 
motion. Frequently the expression m[l — (u/c)*]~'”? is called the 
“mass’ as distinct from the ‘rest mass’ 77. 

The kinetic energy of a mass point is equal to 


—icg, = πιο] — (u/c)*}-"?. 
Thus momentum and energy are—apart from constant factors— 
components of a four-vector. The component F; of the force is 
given by 
f= Fyu, + φως + Foug 
πον" ΦἯἹ — sepa 

This modification of Newtonian mechanics is consistent. Special 

relativity becomes identical with Newtonian mechanics in the limit 


of small speeds (v < c) and its deviations at higher speed have been 
confirmed by experiment. 
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Thus it is shown that the ideas underlying the ordinary vector 
methods are applicable in wider fields of physics than they were 
originally designed for. In fact their applicability is by no means 
restricted to the special theory of relativity. 


CHAPTER 10 


Matrices ΙΝ CLASSICAL STATISTICAL 
Mecuanics 


41. Introduction 


The mathematics required for formulating the principles of classical 
statistical mechanics does not include matrices or abstract vectors. 
These concepts are, however, used in applications of the general 
theory to specific objects. An example is considered in this chapter; it 
concerns the thermal properties of mixed crystals. __ 

If a simple cubic lattice is formed by atoms of two kinds they may 
alternate regularly along the lattice points in the direction of the 
crystallographic axes. Alignments of this kind are stable at low 
temperatures, if at all. If it is ene which kind of rte Seay aie 

iven lattice point the kind of atom occupying any far a 

ao can ie traced with a probability markedly higher than 50%. 
This state of affairs is briefly called “long-distance order’. At higher 
temperatures some of the order is lost on account of local irregu- 
larities; a residual long-distance order will still persist, as the regions 
of irregularity are enveloped by well aligned regions. Beyond a 
certain ‘critical’ temperature all long-distance order vanishes. The 
probability for any lattice point being occupied by one or the other 
kind of atom is 50% whatever may be known of other lattice points 
not in the immediate surroundings. The collapse of long-distance 
order occurs abruptly with increasing temperature and is called an 
‘order-disorder transition’, It is accompanied by anomalies in the 
specific heat and other thermodynamic quantities. 

A similar transition occurs at the Curie point where ferromagnetic 
bodies become paramagnetic. This is connected with the long-distance 
alignment of electronic spins and its break-down at higher tempera- 


Cs. 
- 101 
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In attempting to establish a quantitative theory of these phenomena 
some simplifying assumptions are usually made which are not too 
remote from reality. It is assumed that alignment is exclusively due 
to the interaction of atoms which are nearest neighbours in the 
lattice. Every lattice point is endowed with a configurational variable 
which can take two values only, usually defined as +1. They 
correspond to the kind of atom occupying the lattice point or to the 
orientation of the spin. If the variables at two neighbouring lattice 
points are equal to each other (1, 1 or —1, —1), they are to contribute 
an amount —é to the energy: if they differ from each other (1, —1 or 
—I, 1) they are to contribute an amount « to the energy. In the 
present case ¢ will be assumed to be negative. The contributions of all 
pairs of neighbours in the lattice are to add up to a configurational 
energy £, which is independent of the kinetic and potential energies 
of lattice vibrations. The values of the configurational variables are 
not to affect the entropy of the crystal, so that every configuration 
of the crystal occupies the same amount of phase volume. 

It may be noted that the relation between the values of the con- 
figurational variables and the configurational energy is of an un- 
common kind. Every contribution of +¢ depends upon the values of 
two different variables and every variable is an influence in a number 
of different contributions. 

There are 2% different configurations of which every one makes a 
definite contribution to the energy. It is the main object of the theory 
to find the thermal energy and specific heat of the crystal. The pro- 
babilities of the configurations are equal to exp (—E,/kT), where T 
is the absolute temperature and k = 1-371 erg/degC is Boltzmann’s 
constant. The thermal energy is to be obtained as that expectation 
of energy determined by the probability distribution. It is to be 
derived by way of the partition function 


ξ(7) = > exp (—E,/kT) 


the sum being taken over all configurations. The thermal energy is 
equal to 


ὃ ᾿ 
kT*—Alog 6) 


It is possible to deduce the partition function successfully only for 
the two-dimensional net of atoms which is used as a model for a 
crystal lattice. This is shown in the following sections. Extensive use 
will be made of matrix algebra. In this chapter certain special types 
of matrices will be used again. 
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42. Formal solution 


ider a rectangular net of m chains with a sites in every chain. The 
ie of Reution is denoted by -te’ if it concerns pairs of atoms 
within a chain and by +e if it concerns pairs of atoms in different 
chains. At first e’ and ¢ are supposed to differ from each other. A 
one-dimensional periodic boundary condition is adopted by assum- 
ing that the atoms in the mth chain interact with the atoms in the 
first. Every chain has 2” possible configurations; let them be labelled 
by numbers » (or μὴ which may be equal to 1,2... 2". Configurations 
of different chains are denoted by distinguishing between (1), 
(2)... vj)... (m). Every set of these numbers denotes a configura- 
tion of the whole net. The relative probability of the configuration ν 
is equal to exp (—E,/kT) where E, is its configurational energy. an 
the probability of the configuration be denoted by u(y, 7); in or τ 
to indicate configurations of the first chain the notation u{r(1), γί Ὶ 
is used. If the first and second chain are considered jointly the joint 
probability of a configuration (1) and »(2) is equal to 
ufr(1), »(1)] υν(1), »(2)] ulr(2), »(2)] 

the second factor takes account of the interaction of the atoms 

ΣῊΝ two chains and the resulting effect on the joint probability. 
Continuing a third, fourth and eventually an mth chain are aut 
The joint probability for the configurations of all chains is the 
probability of a configuration of the net. It is given by the expression 


, (2 2), »(3)] 
ufr(1), »(1)} εν), »(2)] ant?) oo eo r[o(m), »(1)] (42.1) 


If use is made of the Kronecker symbol d(, ») we have 


ul), >] = > ule) D1 Le), "Ὁ (42.2) 
uj) 
iti i f expressions 
The partition function of the net is equal to the sum of exp : 
(42.1) with respect to all possible values of »(1), »(2) . . . »(m). Taking 
account of (42.2) this can be expressed as 


=D DD ZL 


>, | |e, D1 eA, "Ὁ 
»(1) v(m) μ(1) wm) j= ἀν, ΚΟ ἘΦ (42.3) 


ym -- 1) τὸ Ῥ(]) and p(m-+ 1) = (1). 
wThis κϑωοκ θη Ἢ  adily interpreted as an instance of οι 
multiplication. Let V be a matrix with the elements v(v, 4) and 
diagonal matrix with the non-vanishing elements u(», vy). Then the 
product of the first three factors in (42.3) summed with roars μὴ 
γ(1) results in an element of the matrix UV. The whole produc 
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summed with respect to (2)... μι"), (1)... »(m) yields a diagonal 
element of (UV)”. The last summation with respect to (1) gives 
the trace of (UV)” or (VU). If m is a large number the trace of 
(UV)” can be replaced by the mth power of the largest eigenvalue of 
UV, 


The deduction of this eigenvalue is not merely a problem of 
diagonalization but also of constructing the two matrices. 


43. Expressions for matrices 


In the arguments of the preceding section reference is made to 
matrices U and V but they have not yet been defined; they should 
have 2" rows and columns. These matrices will now be expressed in 
terms of other matrices of the same number of dimensions but of 
greater simplicity. 

In equation (10.1) matrices in two dimensions have been defined 
and denoted by I, Υ and Z. Repeated direct multiplication of these 
matrices is now used for defining matrices of 2” dimensions, 


Z=—IxXIx...xIxZxl... x] 
{Seed ΧῚ πο e  ee Delis 


In these expressions the rth factor is Z or Y respectively;r = 1... ἡ. 

The matrices Z, are diagonal. Every diagonal element corresponds 
to a configuration of a chain. They can be divided into two Classes, 
e.g. those in which the configurational variable at r has the values 
| and —1 respectively. The diagonal elements of Z, have the same 
value (-+-1) as the configurational variable at r. The matrix y Ay TN 
is also diagonal. Its diagonal elements are 1 if the configurational 
variables at r and r - 1 are equal to each other; the diagonal ele- 
ments are —1 if the configurational variables at r and ἡ +1 are 
unequal. It follows that the diagonal elements of —Z,Z,..; are equal 
to the contribution of the sites r and r + 1 to the configurational 
energy. The diagonal matrix elements of Z,Z,., arein every diagonal 
element equal to the value of the total configurational energy of the 
chain in one of its configurations. 

The matrix U is diagonal and its diagonal elements must be 
equal to exp (—E,/kT). It follows that 


U = exp (α' > Z,Z,,.,) (43.2) 


where a’ = ε ΚΤ (43.3) 
The matrix elements of V are conditional probabilities. They ex- 


press the probability of chain configurations when the configuration 
of the neighbouring chain is known but they do not take account of 


(43.1) 
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the effect on probability of the configurational energy of the chain 
itself. The matrix V is constructed by considering a row of lattice 
points across the chains. Let f(s, 1) and S(s, -- Ὁ be the probabilities 
with which the configurational variable at the point in the chain s 
takes the values +1 respectively. The corresponding probabilities 
for the point in the chain s + 1 are then determined by 
f(s + 1,1) =f(s, 1) οὐ + f(s, —1) e 
f(s +1, -1) =f, 1) e-* + f(s, —1) εὐ 
where a= e/kT (43.4) 
Thus the neighbouring points in the chains s and s + 1 contribute 
to the matrix V a sub-matrix 
leno ge |= el + tea (43.5) 
e* 6 
It is convenient to express V in terms of a function of a rather than 
in terms of a itself. The function is denoted by ὦ, and 
tanh ὁ = e~** (43.6) 
Taking account of (20.13) expression (43.5) becomes 
(2 sinh 2a)*/*[(cosh b)I + (sinh 5)¥] = (2 sinh 2a)¥/? εὟ (43.7) 
The matrix V consists of sub-matrices such as (43.5) about the 
leading diagonal; all elements outside the sub-matrices vanish. A 
matrix of this kind is equal to the m-fold product of matrices 
[e“]J + [e“]Y, 
where J is a unit matrix in 2” dimensions. Then, by (43.7) 


V = (2 sinh 2a)? exp (ὁ ΕΣ Y,) = (2 sinh 2a)"/2 W (43.8) 


where 
W = exp (ὁ > Y,) (43.9) 


The partition function of the net is equal to 
¢ = (2 sinh 2a)™/?2y™ (43.10) 
where γ is the largest eigenvalue of 


WU = exp o> Y,) exp (a’ δὰ ΧΦ ἢ (43.11) 


In attempting to diagonalize WU it would not be practicable to 
proceed by the way of the characteristic equation. An indirect 
approach is required. As a first step, the properties of diagonal 
matrices which have some similarity with WU are considered. 
Matrices of this kind are 
K, = exp (—4g,Z,) = (cosh 4g,)J — (sinh 4g,)Z, (43.12) 


D 


ema 
ge 
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where g, are numbers. The diagonal elements are exp (+42,). 
The product of these matrices is derived: 


K=| | exp (—4g,Z,) = expIH4 σι -Ὲ get..-+ ea (43.13) 


Here every set of +- and — signs in the exponent appears in one and 
only one diagonal element. (43.13) is a standard expression for a 
diagonal matrix in 2” dimensions. 

The unknown diagonalized matrix WU should be a special 
instance of (43.13). The above expressions are not suitable for 
further deductions. This is due to the commutation rules for the 
matrices Y, and Z,. All pairs of these matrices commute except the 
anticommuting Y,Z, = —Z,Y,. It is possible to simplify calculations 
by substituting for the Y, and Z, a set of 2n alternative variables all 
pairs of which anticommute. This substitution is carried out in the 
next section and will prove useful in spite of the somewhat tiresome 
set of manipulations required. 


44. Anticommuting matrices 
Consider now the following set of matrices 
P, = —YyY,. ΟὟ 22, 
Ot 10 Yass ΟΥ̓ Vee (44.1) 
It is readily verified that 
P>= @?—J; P,P, = —P,P, 


Θ,Θ, = — @,Q,; P,Q, = — Q,P, a 
and 
Y,= iP,Q,; Z, = τὸς ἢ ἢ ΔΩ͂Σ Y,—1P,; Z,Z,51 7 —iP,,;Q, 
(44.3) 
Further 
(iP,Q,)? = J (44.4) 
exp (igP,@,) = (cosh g)J + (sinh g)P,Q, (44.5) 


where g is any number. If t +r and ¢ ~s all P, and Ω, commute 
with (44.5). 

In terms of the new matrix variables the expressions (43.2) and 
(43.9) are changed to 


U(a’) = | | exp (—ia’P,,..@,) (44.6) 


r 


W(b) = | | exp GoP,@,) (44.7) 


r 
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These relations will be used for deriving transformation formulae 
of an unusual kind. Consider the collineatory transformation 
M’ = exp (igP,@,)M exp (—igP,@,) (44.8) 
he sol dig pea AS, ὅτων from the commutation rules that 
Μ’ = Μ. 
If M = P, then 
P’ — [(cosh g)J + i(sinh g)P,Q,}P,{(cosh g)J — i(sinh g)P,@,] 
= [(cosh g)J + i(sinh g)P,@,]{(cosh g)P, — i(sinh g)@,] 
= (cosh? g + sinh? g)P, — 2i(sinh g cosh g)@, 
= (cosh 2g)P, — i(sinh 2g)@Q, (44.9) 
If M = Q, then 
Θ᾽ = [(cosh g)J + i(sinh g)P,@,]@,{(cosh g)J — i(sinh g)P,@,] 
= [(cosh g)@, + i(sinh g)P,][(cosh g)J — i(sinh g)P,@,] 
= (cosh® g + sinh* g)@, + 2i(cosh g sinh g)P, 
= (cosh 2g)Q, + i(sinh 2g)P, pat 
i πι WaQ,W-!, UP,.,U-? an 
wieu nage On canned “Wy means of (44.9) and (44.10); in 
accordance with the commutation rules the matrices Ww, W-, 
U, U-} may, for the purpose of these transformations, be replaced 
by exp (ibP,@,) and exp (-Lia’P,,.,Q,) respectively. It follows 
that 
WP,W-? = (cosh 2b)P,. — i(sinh 2b)@, (44.11) 
WQ,W-! = i(sinh 25)P, + (cosh 2b)Q, 


UP,..,U-! = (cosh 24ΎΡ, 1 + (sinh 2a’)@, rel 
UQ@,U-! = —i(sinh 2a’)P,.., + (cosh 2a’)Q, 


: ; ἃ 
The two-by-two matrices on the right-hand side of (44.11) an 

(44.12) will be denoted by D(25) and D(—2a ) respectively. The 
significance of these transformations is considered in the next section. 


45. Relations between matrices of different dimensions 


left-hand sides of (44.11) and (44.12) specify collineatory matrix 
κάκεῖν -ὀθαρύεμδαι in 2” dimensions. The right-hand sides have the 
forms of vector transformations as defined by (19.3). The transforma- 
tion matrix is complex-orthogonal and of two dimensions—irrespec- 
tive of the fact that the vector components themselves are higher 


order matrices. 
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Let p be a vector of 2n dimensions with the components P,, 
Q,, P., Q,...P,, @, and apply the transformations (44.11) and 
(44.12) to all of its components. Then the right-hand sides represent 
a vector transformation of the form (19.3) 


ρ΄ -- Pb)p ρ΄ = Ψ(α)ρ (45.1) 
The matrices ® and Ψ are complex-orthogonal and consist of 


n two-by-two sub-matrices about the leading diagonal. The sub- 
matrices have the elements 


$(2r —1,2r—1) (2r — 1, 2r) 
fan 2r — 1) (2r, 2r) | (45.2) 
and 
v(2r, 21) y(2r, 2r + 1) 


Where r= 1... πὶ All elements outside the sub-matrices vanish. 
The sub-matrices are equal to D(2d) and D( —2a’) respectively. 

By these transformations a specific relationship is established 
between the matrices W(5) and the matrices (6). In particular it 
follows from (44.7) that 

| W(5,)W(6.) = W(b, + δὼ 
On the other hand repeated application of (44.11) shows that also 
P(b,) P(b,) = B(b, + δὼ 
The relations between the matrices U(a’) and Y(a’) are of a similar 
kind. 

W and U are not the only matrices in 2” dimensions which are 
related in this specific way to matrices of 2n dimensions. In attempting 
to find the eigenvalues of WU it is important to identify a 2n x 2n 
matrix (A) to which the diagonal matrix K in (43.13) is related in an 
analogous way, so that a relation between the eigenvalues of Κ and A 
can be established. If the matrix obtained by diagonalizing ®Y is ἃ 
Special instance of the matrix obtained by diagonalizing A, the 
eigenvalues of ®Y can be used for deriving the eigenvalues of WU. 

For this purpose it is necessary to identify a set of 2n matrices 
(L, and N,) which comply with the multiplication rules (44.2) if they 
are substituted for P, and Q, respectively. Also the products 
LN, are to be proportional to Z,. It is not a foregone conclusion 

that matrices of this kind exist, Their construction is nevertheless 
simple. 

A matrix F of 2" dimensions is defined in terms of the direct mth 


- power of the matrix 
Ι ] 
pie oe 
met με 
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so that 
F= Β΄. = ()”7¥ — Z) x (Y—Z)x...x(¥ —Z) 
As (Y¥ — Z)¥(¥ — 2) = —2Z and (Y — Z)Z(¥ — 2) = —2Y 
it follows that 


FY,F-!— —Z, and —FZ,F-1= Y, (45.4) 
Then the matrices L, and N, are defined as 
L,= FP,F-' N, = FQ.F-} (45.5) 


iplicati les (44.2) are invariant with respect to 
scildaantsahuaiiemeadien ᾿ follows that the matrices L, and N, 
comply with these rules. Also, by (43.1) and (44.3) 
—Z, = FY,F-* = 1FP,@F 
= iFP,F-1FQ,F-} (45.6) 
= iL,N, 
Consider the matrices (43.12) 
K, = exp (—4g,Z,) = exp (JiL,N,) 
By applying the arguments of Section 44, in particular equations 
(44.8)-(44.11), it follows that 
K,L,K,-' = (cosh g,)L, — i(sinh g,)N, 
K,.N,K,—! = i(sinh g,)L, - (cosh g,)N, 


and consequently KoK-? — Ac (45.7) 


N,,. The 
here 6 is a vector with the components L,, N,...L,, Ὁ 7 
dacs of the matrix A are similar to those of ®; the non-vanishing 
b-matrices of A are equal to D(g). ᾿ 
"The. charactecittic cchation of A is a product of n quadratic 
equations each of which is the characteristic equation of a sub- 
matrix (45.8). The eigenvalues of A are thus derived from quadratic 
equations; they are equal to 
Aoy-1 = CXP(g,), Asay = exp (—g,) Ae 
From (45.8) and (43.13) the required relation between diagona 
St oa ἦν δ and 2n dimensions is derived. If the eigenvalues of 
K are denoted by k, this relation is given the form 


1 n 
= - lo (45.9) 
log k, 22, + | logs, | 
where all possible sets of +- and — signs are to be taken in order to 


find all possible values of k,. 
"If sie ὠνανυϊονι of ῬΦ have the form (45.8) the eigenvalues of 


UW are determined by equation (45.9). 
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46. Partition function 


In order to diagonalize the matrix ¥@® it is to be transformed to a 
set of sub-matrices about the leading diagonal which have no row 
or column in common. Both ® and are of this form but their 
sub-matrices do not coincide in rows and columns. Thus ¥ will be 
transformed to ®; a matrix Q is to be found such that 


(9) = NPD(g)2-1 (46.1) 
Taking account of (45.2) and (45.3) it is seen that & must have an 


‘even’ and an ‘odd’ part 
2=—A+T 


with no row or column in common. The even part A must have 
diagonal elements in even columns equal to unity and all its other 
elements must vanish. The elements of I must vanish in all even 
rows and columns; the remaining part is to be real orthogonal and 
should replace P, by P,.,. This matrix is accordingly of a similar 
type as the matrix T of Section 30. 

82 can be diagonalized by unitary transformation; for obvious 
reasons its eigenvalues must be the mth roots of unity, i.e. ἡ, 7... 
n”~', where 7 = exp (2zi/n). Let the unitary transformation matrix 
be denoted by ©. Like Q it must consist of an ‘even’ and an ‘odd’ 
part with no row or column in common. The even part must be 
unitary in order to transform A, virtually a unit matrix, into itself, 
Since also the odd part should be unitary the following set of matrix 
elements is admissible: 


ϑ(ῶν, 2r) = ϑίῶν — 1, 2r — 1) 
O(2r, 2ν — 2) = ϑ(ῶν — 1, 2r — 3) 
ϑ(ῶν, 2» — 4) = O(2r — 1, 2r — 5) 
The matrix © can be factorized in the form 
e=Ixé (46.2) 


where I is a unit matrix in two dimensions and & is a unitary matrix 
in πὶ dimensions. The matrix ® admits a similar factorization: 


P(g) = D(g) x J, (46.3) 


where Ὁ is defined by (44.11) and J,, is a unit matrix in ἡ dimensions, 
As direct multiplication and matrix multiplication commute with 
each other it follows from (46.2) and (46.3) that 


ΘΦ — (I x E\(D x J,) 
=IDx 2J,=DI x Je (46.4) 
=(D x J,)(l x &) = 60 
Thus it is seen that © commutes with ®. 
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If the matrix obtained by diagonalizing $2 is denoted by &2’ we 

have : 
Ω--ἙθΩΘ-: Ω- = Θῶ"-10-.- 
It follows from (46.4) that 
Y(a')B(b) = ΘΩ.' Ὁ -'Φ(α ΘΩ΄-19-:Φ 0) 
= O22’ Φί(α )2΄ -᾿Φ()ὴ)Θ-: (46.5) 

In this way the product ὙΦ is transformed to the required form of 
separate two-by-two sub-matrices without actually evaluating the 
transformation matrix. Also a knowledge of the matrix © is not 
required ; it enters into (46.5) by way of a collineatory transformation 
and this cannot affect the eigenvalues of ἘΦ. Hence omitting the 
first and last factor of (46.5) diagonalization is now possible by 
elementary means. The matrix product 


92’ B(a') 92-1 B(b) 
consists of sub-matrices of two dimensions with no rows or columns 


in common. It is merely necessary to diagonalize the sub-matrices. 
They have the form 


7” 017 [cosh 2a’ i sinh 2α΄] [7' | fee 2b —isinh [ἢ 
K 1 bape 2a’ cosh2a’ [10 1 Lisinh26 cosh 2b 


cosh 2a’ cosh 2b ify)" sinh 2a’ cosh 26 


— 9" sinh 2a’ sinh 2b — cosh 2a’ sinh 25] 
~ | ifeosh 2a’ sinh 2b cosh 2a’ cosh 2b Ὁ 
— »~* sinh 2a’ cosh 25] — 4~" sinh 2a’ sinh 2b 


The determinant of the sub-matrices is unity; their trace, denoted by 
2 cos «,, is equal to 

2 cos «, = 2(cosh 2a’)(cosh 2b) — 2(sinh 2a’)(sinh 2b)(cos 2zr/n) 
The eigenvalues of the sub-matrices are equal to exp (- α,), where r 
can take any integer value from 1 to n. The eigenvalues are of the 
form (45.8) with «, substituted for g,. The largest eigenvalue of 
UW is, in accordance with (45.9), determined by 


log y = : cosh! [(cosh 2a’)(cosh 2) 


i. (sinh 2a’Y(sinh 26) cos 2nr/n] (46.6) 

Equation (46.6) is the answer to the problem of matrix algebra 
which was formulated in Section 42. Determination of the thermo- 
dynamical properties of the crystal requires additional mathematics 
details of which are given in the original paper by L. Onsager, 1944, 
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Phys. Rev., 49, 117. It is not related to matrices, but concerns the 
evaluation of the sum in (46.6). 


In order to obtain the properties of a large crystal lattice the dis- 
tinction between the two binding energies is abandoned so that 


ε΄ = eanda’ = a= &/kT. The partition function per atom is equal 
to ¢" and determined by 


Ι _1,  /2sinh2e\ 1 Δ 
nl98E = ρα {πη Ὁ) +5 > σοὶ 


kT 
ΕΞ 5 (222) 
sinh (2¢/kT) n 
In the limit of large n the last term can be replaced by 
 ffeoan- [emb eH) — oo 

= | “cosh Ee (2e/kT) cos t| dt 
The most important result of this theory concerns the specific heat 
per atom which is found to show a logarithmic infinity at the tem- 


perature 7’= 0-88e/k. Long-distance order breaks down αἱ the 
same temperature. 


The lengthy deductions of this chapter demonstrate the power of 
matrix methods in a somewhat unusual field. 


CHAPTER 11] 


OurLook on Quantum Puysics 


47. Subject of the theory 


Distinction between the facts as described by a physical theory and 
the mathematics employed in that description is never easy. In 
quantum physics it is even harder, since both, the facts and the 
mathematical techniques, are of an unusual kind. ὃ | 

In this chapter the mathematics of quantum physics is — 
in such a way that its interpretation is made as simple as possible. 
Matrix algebra is used as it is particularly suitable for the present 
purpose. It would, however, not be sufficient for establishing the 
complete mathematical equipment of quantum physics. — ma 

First the experimental foundations of quantum physics wi 
reviewed; it is assumed that readers are familiar with these so that a 

t summary is sufficient. 
The te sees of electromagnetic radiation, in particular of light 
and of X-rays, is well established by diffraction and interference ex- 
periments. Nevertheless the photoelectric effect and Compton effect 
do not fit the properties of wave fields. Radiation behaves in these 
effects like a beam of light particles (photons) which carry energy 
and momentum and transmit these entities to material spear 
by the way of collisions. In thermal equilibrium the μεπενήρο της, ο 
energy over the frequency spectrum is determined by ΡΙΆΠΟΚ᾽ 8 
which at high frequencies describes the energy distribution of a 
photon gas and at low frequencies the energy distribution in an 
ly of classical waves. 

-"“"- protons and other material particles can be aoe 
and move along trajectories, broadened images of which are o ‘ 
served in cloud chambers. Nevertheless they can penetrate rei 
barriers of energies higher than their own. On interaction with ΜΝ 
lattices they show interference patterns not different from those o 
X-rays. They have accordingly rr properties. 
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= po oer of avery and other compound systems is 
€ energy levels the existence of whi 
demonstrated experiment ae i acta 
: ally by way of collisi 
Energies between levels ar ᾿ meine Oo ee 
: € apparently not admissible: 
hand transitions betwee bathe te he ae 
n these levels do occur, w 

a through the intermediate range. Sa pet rah ee 

ri + of bina poeroe a account for some of the observations in 
€ models. It seems on the other hand i 

rs these models mutually compatible. Kaige 

cece 4 τόμ μὰν τὶ τὸν aaa he a in giving a coherent account 

paved the way for new discovery, st 
a reinterpretation of the concept of Spo He gr 
rpre movement, or more ll 
change in time. In classical ph ἃ ' “apes πὶ 
time. 5105 movement of a particle i 
by regarding the positi τ Ἐ teattar f a 
position vector as a continuous functi 

time; variations of other m Palabaditinn oe 
3 Var echanical or electrical iti 

specified in terms of continuous ti i τανε μετα 

; 58. time functions, In 

a dynamical variable is either Beit ar i 
a constant of the motion—such 

energy of a conservative s it is j πύαμτιπετοτσῃ 

ι ystem—or it is indeterminate. Ind 
minacy replaces any continuous variation in ti smameiiiet 
ariation in time. Th 
value of an indeterminate oe hoa ἃ 
quantity can be found by measurement 
to 
any desired accuracy. However, the result of a measurement does not 


ei oe pee ΠΝ the question what is the probability 
Vv: ¢ any particular value under gi 

mental conditions. In this cont ees! ihothes dhe 

" ext the question arises whether 

th 
ee of the energy or of any other dynamical variable se 
oe ed and, if so, to which values they are restricted, 
556 questions will be partially answered in the next section. 


48. Matrix mechanics 
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ordinates and components of momentum of the particles 
E = H(q™, g”) ΠῚ ΧΩ, or. ve) 


the ‘Hamilton function’. By solving the equations of motion the co- 
ordinates and components of momentum are obtained as functions 
of the time. 

If a system of this kind is to be considered from the point of view 
of quantum physics the numerical values of the coordinates and 
components of momentum must not be functions of the time. On 
the other hand there is reason to believe that the mechanical equa- 
tions of motion do retain some significance. In order to meet these 
two requirements the dynamical variables are represented by 
matrices, in fact Hermitean matrices, which are time dependent and 
which are related by the mechanical equations of motion. 

At this stage no attempt is made to decide whether or not the 
energy should be restricted to discrete values. Rather, the restriction 
to discrete levels is accepted as an experimental fact. The energy is 
then represented as a diagonal matrix with diagonal elements equal 
to the discrete energy levels of the system. Degeneracy is admitted; 
thus two or more diagonal elements may be equal to one and the 
same value of the energy. 

Dynamical variables other than the energy are represented by 
matrices which are usually not of diagonal form. The rows and 
columns of these matrices correspond to the rows and columns of the 
energy matrix. If A is a matrix representing a dynamical variable, 
element a;, is accordingly connected with two values of the energy 
E; and Ε, (which in the case of degeneracy may be equal to each 
other). These matrix elements also depend on the time 1 and are 
given by 

ay, = bi, exp [27i(E; = E;,)t/h\ (48.1) 
where δ᾽, is independent of the time and h (= 6°625 x 10“ "7 erg sec) 
is Planck’s constant. Obviously the diagonal elements are time- 
independent. 

It is further assumed that the matrices representing Cartesian 
coordinates or components of momentum commute with each other, 
except those pairs representing a Cartesian coordinate and its cor- 
responding component of momentum to which the following com- 
mutation rule applies: 


pq” — q?p® = (h/2zi)l (48.2) 


In fact this commutation rule cannot be satisfied by finite matrices. 
Fortunately, however, it can apply to finite matrices asymptotically ; 
in a set of matrices of increasing numbers of dimensions it is usually 
possible to satisfy equation (48.2) everywhere except in a single 
element the significance of which diminishes with increasing number 
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of dimensions. Thus the results of matrix algebra as obtained in the 
earlier chapters can be applied. 

In order to construct the matrices and derive the energy levels it is 
possible to proceed in the following way. Coordinate and momentum 
matrices are constructed in such a way that they comply with the 
multiplication rules; otherwise their choice is arbitrary. Using the 
classical Hamilton function the energy matrix is constructed as a 
function of the coordinate and momentum matrices. The resulting 
energy matrix can eventually be diagonalized; its eigenvalues are the 
required energy levels. 

This procedure is impracticable. Nevertheless in special instances 
the deduction of the energy levels is a relatively simple problem, 


49, Harmonic oscillator 


Let a particle of mass m be bound to a position of equilibrium bya 
restoring force that is proportional to its negative displacement. Its 
Hamilton function is 

H = (p?/2m) -+- (27°my*)g? (49.1) 


Where m is the mass, g the displacement, p the momentum and » the 
frequency. The equations of motion are 


Se ce 3, βαρ 2 
They are solved in terms of two constants of integration (a and δ) by 
gq = α 005 2πνί +- b sin 2xvt (49.3) 
and the energy is 
E = 2πϑνϑ(α3 +- 5?) (49.4) 


and, of course, independent of the time. 


In the early stages of quantum theory it had been assumed that 
the energy is restricted to discrete levels 


E,=nh (n=0,1,2...) (49.5) 


This was a hypothesis which, in fact, makes the historical origin of 
quantum theory. 

It was further inferred that transitions between these energy levels 
could occur only in such a way that » increases or decreases by 
unity. 

It will now be shown that the theory of the harmonic oscillator 
can be derived as a special instance of matrix mechanics as laid down 
in the preceding section. The Hamilton function is to be regarded as 
a matrix, being a function of the matrices p and 4. By (23.C) the 
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eigenvalues of H must be non-negative as it cg ia to the 
sum of two Hermitean matrices with non-negative cigenva ve ‘ 

On account of the rule for differentiating matrices (cf. oe se ‘ 
and exercise 3, Chapter 1) and by (48.1) the second time — Vv 
of the matrix q is given by a matrix q consisting of the elemen 


Giz = --ἀπΐ, — Ey)? /h?]g ix 
so that equation (49.2) turns into 
4n*{— (E; — E,)*/h? +- »}q5,, = Ὁ (49.6) 


It follows that the diagonal elements g;; must vanish, and that non- 
diagonal elements g;,, also vanish unless 
E; — Ex = thy (49.7) 


r, not all elements of the coordinate matrix can be zero. 

F μαρηληρτνήνοα the eigenvalues of H must have the i ag ΕΣ 
ference shown on the right-hand side of (49.7) and take the — ἐ 
an arithmetic progression. There must be ἃ lowest energy Eg us 
negative eigenvalues of H are ruled out (cf. 23.C). : 

The elements of the coordinate matrix vanish except for q, ha 
32 + - - ANd 401. 412» 423 - - - The coordinate and momentum pitti 
are related by equations (48.2) and (49.2). According to the ao = 
equation the elements p;, vanish unless q;;, differs from zero. 


bining both equations it follows that 


or 
i h 
— = (4; — Ev gnu = = Ex)qinGirl = aL 
ἱ 


The diagonal element of this equation reduces to 
| Yuin-+-ry [ἢ — | φμρι--) |? = h/820?my (49.8) 


as all other terms in the sum vanish. On account of this se 3 te 
moduli square of the elements of the q-matrix form an = “ats 
series which continues to infinity without any gap. It must, se οὐ 
have a lowest term since the moduli square could not be negative. 


: it 
This term is equal to ΝῊ ἔρις wpe tas) 


ἱ . If (49.1) is inter- 
The elements of q determine the elements of p. If | 
preted as a matrix equation and q and p are inserted it may a = 
cluded that the eigenvalues of equation (49.8) form an unbroke 
arithmetic series from the lowest value to infinity. 
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The lowest energy level is determined by 
- By = > (4) PoxP xo + 2πῆν"ηοκηκο] 
i tie 


= (4m) | por [2 + 22?v? | Jor |? = thy (49.10) 


E,=G+mh (n=0,1...) (49.11) 

This is similar to the original conjecture of (49.5), the difference being 

the “zero-point energy’, which prevents the oscillating particle from 
reaching a state of rest. 

The coordinate and momentum matrices consist of the elements 


᾿ lth 1)]¥2 : ; 
Ynt+in = Inin+1) = Ἴ — exp (2zivt + ip) 


so that 


| (49.12) 
βοὴν = Punt) = 5[2hmo(n + 1}}} 5 exp (2zivt +- ify) 


where the #,, are real but otherwise arbitrary phase constants. 

In this way the deduction of the energy levels and construction of 
the coordinate matrix are completed. 

It may be noted that the non-vanishing elements of the coordinate 
matrix correspond to those pairs of levels between which transitions 
are admissible. The reason for this coincidence is a relation between 
transition probabilities and the moduli of matrix elements; its 
deduction would require greater details of the theory than can be 
presented here. 


50. Interpretation 


So far matrices have been introduced in a formal way. Physical 
meaning was attached only to the eigenvalues of the energy matrix 
which are shown to be the admissible energy levels of the quantum 
mechanical system. 

The diagonal elements of matrices with non-vanishing elements 
off the diagonal admit another simple interpretation. If A represents 
a dynamical variable a, the elements a,; are the statistical expectation 
of a, on the assumption that the energy is £;. In terms of observations 
the meaning is as follows. 

First the object is brought to such conditions that its energy is 
£;; this can, if necessary, be checked by measurement. Subsequently 
a 15 measured and the result is recorded. By the process of measure- 
ment the energy has been changed. Thus, before repeating the mea- 
surement of a the object has to be restored to the energy E;; only 
then is the measurement of a repeated. If a number of measurements 
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have been taken the arithmetic mean of the results should be equal 
to aj}. 

If the energy is degenerate, the same procedure is possible if the 
control of the energy is supplemented by the control of alternative 
constants of motion. 

Non-diagonal'elements of matrices do not determine the expec- 
tation of dynamical variables but they contribute to the expectations 
of the higher powers. Even more important is their relation to transi- 
tion probabilities and collision cross sections; these relations are, 
however, neither of a general kind nor could they be derived by any 
simple argument. 

Additional means of interpretation are available if unitary trans- 
formations are admitted. The transformation matrices may depend 
upon the time. The transformed energy matrix is no longer diagonal 
and the time dependence of the transformed matrices is, in general, 
not sinusoidal. By transformations of this kind matrices other than 
the energy matrix can be diagonalized. Their eigenvalues are inter- 
preted as the possible results of the measurement of the dynamical 
variable which the matrix represents. 

Consider again the above repeated measurements of the dynamical 
variable a. Let U be unitary and let A = UA’U~— where A’ isa 
diagonal matrix. Then 


aj; = y καρ; Uj? = y Aix, | Ujx |? (50.1) 
k k 


As ἀρ is the statistical expectation of a random variable which can 
take the values aj,, and since | τ, [5 satisfies | u,;, |* > 0 and 


ἣ» | uj, [2 = 1 (50.2) 
k 


the coefficients | u,;, [2 are interpreted as probabilities. They deter- 
mine the probability that the value aj, is obtained in any measure- 
ment of a when it is known that the energy is E;. In the conventional 
terminology U is the transformation matrix from the ‘eigenstates’ of 
a to the eigenstates of the energy. The term ‘eigenstate’ signifies 
conditions in which the measurement of a dynamical variable yields 
with certainty a specified result. If the transformation matrix can be 
deduced from the theory, the above set of measurements is better 
determined than if only a;; were known. The elements of the trans- 
formation matrix determine the probability distribution of all 
possible results of the experiment. 

However, the above methods cannot cater for dynamical variables 
which are not restricted to discrete values and therefore do not give 
a complete picture. An alternative approach is outlined in the 
following. 
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Matrices representing dynamical variables can be transformed in 
such a way that they become independent of the time. In this case 
the change in time of the mechanical system is fully determined by 
the change in time of the transformed matrix. 

Introductions to quantum mechanics usually start with a time- 
independent representation of dynamical variables. The change in 
time is specified in terms of a time-dependent ‘wave function’ ψι(α, ἡ). 
This is a complex function of the 3N coordinates of N particles of 
which the mechanical system consists. Let this set be denoted by x. 
Also the wave function is associated with a discrete energy level of 
energy £;. By means of multipliers which are independent of the 
coordinates it is possible to normalize the wave functions in such a 
way that 


[ιν [5 ἀχ = 1 


the multiple integral being taken over the whole range of the 3N 
coordinates. 

Although these functions bear no resemblance to matrices they 
play in fact the part of transformation matrices, transforming from 
eigenstates of particle positions to eigenstates of the energy. The 
normalization rule (50.3) is analogous to (50.2). The interpretation 
of wave functions is similar to the interpretations of transformation 
matrices. | y,(x, 1) |? dx is the joint probability for the positions of 
the particles if the energy is equal to E,. 

Dynamical variables are represented by operators involving 
differentiation of the wave function with respect to the coordinates 
and multiplication of the wave function into known functions of the 
coordinates. The wave functions are determined by a partial differ- 
ential equation known as Schrédinger’s wave equation. This kind of 
mathematics has little in common with that of the preceding section. 
The interpretations are, however, virtually the same as before. If 
is the operator representing the dynamical variable a the ex- 
pression 


(50.3) 


an, = | Vix, Not y(x, ἢ dx 


(y; being the conjugate complex to y,) is an clement of the matrix A 
so that the interpretation of operators is reduced to the interpreta- 
tion of matrices. In particular the diagonal elements 


ay; = [ν (x, ἡζψ,(χ, t) dx 


are again the statistical expectations of a if E; is the energy. (If the 
level E; is degenerate it is associated with two or more wave func- 
tions; it may, however, be possible to identify any particular one of 


ἘΞ μι ““- ᾿ 
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these functions by recording the value of the energy and some other 


constants of motion.) a. 
If # is the operator representing the Hamiltonian and if the wave 


functions comply with the conditions 
Hy; = Ej; 


and | viv, dx = δ 


the energy matrix is a diagonal matrix with the admissible energy 
levels in the diagonal. ‘ 

The εν όσα αν methods for setting and solving these equations 
are outside the scope of this book. This preview of quantum physics 
has accordingly to be concluded at this stage. 

Unitary sc eaiennaticiin matrix elements, traces and eigenvalues 
are standard concepts in quantum physics. For this reason matrix 
algebra plays in the physics of the present times a similar role as the 
calculus played in the physics of the eighteenth and nineteenth 


century. 
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